Closed Bug 864940 Opened 7 years ago Closed 6 years ago

Test a registry script for automation registration

Categories

(Firefox for Metro Graveyard :: Tests, defect, P1)

x86_64
Windows 8.1
defect

Tracking

(Not tracked)

RESOLVED FIXED

People

(Reporter: jimm, Assigned: armenzg)

References

Details

(Whiteboard: [leave-open])

Attachments

(9 files, 1 obsolete file)

No description provided.
Attached file reset script
Attached file register script (obsolete) —
Steps for testing:

- through default programs set IE as the default browser

- purge the system of all firefox related registrations using the reset script. You'll have to search hkey-classes-root for additional app ids to add to the list in here that get deleted.

- download the latest opt build of firefox (not the pgo, there's problem with those builds I need to file a follow up on) - 

http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-win32/1366736193/firefox-23.0a1.en-US.win32.zip

- Copy the firefox folder into a new directory on the c drive:

c:\slave\test\build\application

- download the latest tests zip:

http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-central-win32/1366736193/firefox-23.0a1.en-US.win32.tests.zip

- from within the bin sub folder of the zip, copy metrotestharness.exe into the firefox sub folder from the previous step.

- run the register script attached here.

- open default programs and set 'MozillaTestBrowser' as the default browser

- open a cmd shell in the firefox folder above, and try running:

metrotestharness -url http://www.mozilla.org/
I know it's a pain but would you mind taking these steps for a spin? This registration script is going to go out to all of our test slaves, so we need to be sure it is working.
Flags: needinfo?(netzen)
Yup I'll take it for a spin on a VM of mine but it will likely not be until tomorrow.
Flags: needinfo?(netzen)
(In reply to Brian R. Bondy [:bbondy] from comment #5)
> Yup I'll take it for a spin on a VM of mine but it will likely not be until
> tomorrow.

np, thanks!
Attached file register script
Updated the path to CommandExecuteHandler.exe.
Attachment #740976 - Attachment is obsolete: true
Hi Jim, I followed the steps with your new attachment and it successfully launches the browser in the metro environment at the specified URL.
Attached patch patchSplinter Review
Thanks. Lets check these in so we have them in a repo.
Attachment #741361 - Flags: review?(netzen)
Attachment #741361 - Flags: review?(netzen) → review+
This can be directly imported into a registry preference GPO
Re-imaged t-w864-ix-001.wintest.releng.scl3.mozilla.com and applied the registry script and auto association script. The machine should be tested for general roll out.
Taking. I will test it on staging.
Assignee: jmathies → armenzg
Attached file log
I don't think this has worked. What do you think?
Attachment #746941 - Flags: feedback?(jmathies)
(In reply to Armen Zambrano G. [:armenzg] (Release Enginerring) from comment #14)
> Created attachment 746941 [details]
> log
> 
> I don't think this has worked. What do you think?

No it didn't. 

TEST-UNEXPECTED-FAIL | metrotestharness.exe | ActivateApplication result 80270254

0x80270254 is E_APPLICATION_NOT_REGISTERED. 

So something isn't connected up right. Can I get access to this slave so I can poke around a bit?
Q, could you please do the magic for jimm to look at this machine?
Thanks!
Assignee: armenzg → q
(In reply to Q from comment #10)
> Created attachment 746788 [details]
> Reg file converted to GPO usable XML
> 
> This can be directly imported into a registry preference GPO

I don't see the AppID getting created here. I see it getting deleted (from the cleanup code in the original reg script) but it doesn't look like it gets recreated.

<Collection clsid="{53B533F5-224C-47e3-B01B-CA3B3F3FF4BF}" name="AppID">
<Collection clsid="{53B533F5-224C-47e3-B01B-CA3B3F3FF4BF}" name="{5100FEC1-212B-4BF5-9BF8-3E650FD794A3}">
<Registry clsid="{9CD4B2F4-923D-47f5-A062-E897DD1DAD50}" name="{5100FEC1-212B-4BF5-9BF8-3E650FD794A3}" status="{5100FEC1-212B-4BF5-9BF8-3E650FD794A3}" image="3" changed="2013-05-08 01:43:33" uid="{7A28A1D3-AF19-E121-FEE0-ECCF3B2481FB}">
<Properties action="D" displayDecimal="0" default="0" hive="HKEY_CLASSES_ROOT" key="AppID\{5100FEC1-212B-4BF5-9BF8-3E650FD794A3}" name="" type="REG_SZ" value=""/>
<Filters/>
</Registry>
</Collection>
</Collection>

I think action="D" means delete. 

More generally, we could remove all the delete code from this if it's screwing up the GPO script generation and just update existing data.
Looks like CLSID\{5100FEC1-212B-4BF5-9BF8-3E650FD794A3} may have the same problem.
These entries may propagated up from HKEY_LOCAL_MACHINE data, so I'm not sure. I guess it depends on the order this gets inserted.
Magic done just in case jimm wants to take a looks. The removal of the delete actions sounds like a reasonable work around since I order of operations can be different in GPO than in a straight reg script. I can go and change the delete actions in the gpo and we can test if need be then back update the script for posterity's sake.
So the machine in question is "t-w864-ix-001.wintest.releng.scl3.mozilla.com"? 

Usually to connect to a test slave I'd need RDP login info and an ip address I can plug into my vpn software.
That is the correct machine name, the ip is 10.26.40.31.
RDP is disabled in window8 however VNC works and the "magic" Armen referred to resets the VNC and cltbld passwords. I will back channel those to you.
Depends on: 870012
I have manually change the Administrator's password to match the one of cltbld.
jimm needed it.

It nevertheless required me to re-activate the UAC prompts.
Not sure if this will invalidate any testing.
Hrm, so I'm not having any problems running tests after we rebooted this slave. But armen is still unable to get the tests running and is seeing the same registration error. 

Seems like this might have something to do with accounts. I'm running tests under cltbld, from a windows command prompt with cltbld permissions using:

C:\slave\test\build\venv\Scripts\python -u C:\slave\test\build\tests\mochitest\runtests.py --appname=C:\slave\test\build\application\firefox\firefox.exe --utility-path=tests/bin --extra-profile-file=tests/bin/plugins --certificate-path=tests/certs --close-when-done --autorun --console-level=INFO --browser-chrome --metro-immersive

What might be different between this and when automation tries to run the same tests?

One other interesting note - when I tried running tests from an admin prompt under cltbld the browser wouldn't start. From what I could tell the browser was trying to access a profile directory under the administrator's user directory which it didn't have access too. This resulted in a black screen after launch with no test runs.

However I haven't reproduced the E_APPLICATION_NOT_REGISTERED error at all.
Oh, also - 

c:\mozilla-build\python27\python -u scripts/scripts/desktop_unittest.py --cfg unittests/win_unittest.py --mochitest-suite metro-immersive --download-symbols ondemand

works as well under cltbld.
Hrm, looks like the reboot reset the admin pass. I was going to try running tests as admin under ctlbld but the password isn't working.
(In reply to Jim Mathies [:jimm] from comment #26)
> Hrm, looks like the reboot reset the admin pass. I was going to try running
> tests as admin under ctlbld but the password isn't working.

I tried fixing the Administrator password again but I'm unable to :S
I'm really curious how test runs launch on these slaves. I think we are having problems with mixed account access. For example, does the process that launches a test run execute under the ctlbld user or as an admin?
I don't have the last word but this is how I have seen it work before.

A task is added on the Admin account, which specifies that when the cltbld logs in, it will start a process with the highest privileges.
(In reply to Armen Zambrano G. [:armenzg] (Release Enginerring) from comment #29)
> I don't have the last word but this is how I have seen it work before.
> 
> A task is added on the Admin account, which specifies that when the cltbld
> logs in, it will start a process with the highest privileges.

By 'highest privileges' do you mean administrator level?  I'm curious what account privs the tests actually run under. From the tests I've run under this scenario we apparently have some access issues with the profile directory, and potentially with getting the browser launched.

If we're running tests while logged in as cltbld the tests should probably run as cltbld. We should confirm this is the case, and if not, try to get it working that way.
It is similar to this:
wget -OC:\\slave\\talosslave.xml "http://people.mozilla.com/~armenzg/win7/talosslave.xml"
schtasks /create /tn talosslave /xml "C:\slave\talosslave.xml"
<RunLevel>HighestAvailable</RunLevel></Principal>
I don't think it is necessarily as the Administrator user but with Admin like privileges.
I don't known exactly what Q runs on the win8 machines but it should be similar.

As far as I know, there are some jobs that requires us with high privileges. I don't remember what but I could test without it.

Also to point out, we have been able to run the metro jobs on this same machine and with the same start up task. I don't know what is different this time.
I can try trigger the jobs while I'm VNCed into the machine.

What would you like me to try?
I'd suggest confirming this works for you under cltbld privs:

> c:\mozilla-build\python27\python -u scripts/scripts/desktop_unittest.py
> --cfg unittests/win_unittest.py --mochitest-suite metro-immersive
> --download-symbols ondemand

Then maybe try doing the same with automation to see if it acts differently. The only difference I can think of would be the account under which we launch the runs.
If you need admin privs try using "root" instead of "administrator" with the loaner password
We do run the tasks with registry changing and UAC elevated privileges. Originally this was done to try to combat the UAC prompts before we figured out that a certain executable was named with verboten windows words.
Attached patch logSplinter Review
I modified the task in the task manager and I run buildbot without "highest privileges" and I got the same issue. It actually runs as "cltbld" user.

What else could we try?
(In reply to Armen Zambrano G. [:armenzg] (Release Enginerring) from comment #35)
> Created attachment 747508 [details] [diff] [review]
> log
> 
> I modified the task in the task manager and I run buildbot without "highest
> privileges" and I got the same issue. It actually runs as "cltbld" user.
> 
> What else could we try?

WARNING -  TEST-UNEXPECTED-FAIL | metrotestharness.exe | CoAllowSetForegroundWindow result 80070005

This is different. I think this fails because the command window isn't in the foreground. We can try to address this in the test harness.
(In reply to Jim Mathies [:jimm] from comment #36)
> (In reply to Armen Zambrano G. [:armenzg] (Release Enginerring) from comment
> #35)
> > Created attachment 747508 [details] [diff] [review]
> > log
> > 
> > I modified the task in the task manager and I run buildbot without "highest
> > privileges" and I got the same issue. It actually runs as "cltbld" user.
> > 
> > What else could we try?
> 
> WARNING -  TEST-UNEXPECTED-FAIL | metrotestharness.exe |
> CoAllowSetForegroundWindow result 80070005
> 
> This is different. I think this fails because the command window isn't in
> the foreground. We can try to address this in the test harness.

With your testing if you can make sure the test output console is in the foreground it will allow you to get farther in. I'll land a patch on central that makes this a non-fatal error.
Armen can you run again? I just double checked and made sure all of the registry changes were propagated with no deletes.
Hrm, I can't make this non-fatal. If the console trying to launch metrotestharness isn't in the foreground, it'll fail to launch the browser. :/
By having cmd in the foreground I can see the tests running.
I will take a screenshot once the machine reboots so we can see what is up and running.

10:48:28     INFO - TinderboxPrint: mochitest-metro-immersive<br/>550/<em class="testfail">16</em>/1
Attached patch focus patchSplinter Review
Attachment #747520 - Flags: review?(netzen)
So we can touch this up somewhat.

What's the state of these machines before tests run? Do they have the desktop loaded and displayed or do they have the immerisve interface displayed?
Whiteboard: [leave-open]
The CMD prompt is always on the back behind the Libraries window:
http://cl.ly/OrZ6
(In reply to Armen Zambrano G. [:armenzg] (Release Enginerring) from comment #43)
> The CMD prompt is always on the back behind the Libraries window:
> http://cl.ly/OrZ6

Seems kind of random, I wonder why that library window is open? Regardless, if it always starts out this way then explorer will have the foreground focus and we don't need to make the call that's failing. My patch accomplishes this.
The library windows open because of a hack to "Show the desktop": http://www.chrisnackers.com/2013/02/06/windows-8-show-desktop-at-logon/
(In reply to Q from comment #45)
> The library windows open because of a hack to "Show the desktop":
> http://www.chrisnackers.com/2013/02/06/windows-8-show-desktop-at-logon/

Ok, that's actually good - having explorer in the foreground solves the problem since explorer launches metrofx. The patch I posted will need to land and then we can retest.

Sounds like we are almost there!
Attachment #747520 - Flags: review?(netzen) → review+
I've triggered that changeset on staging. I will review the results in the morning.
I've got this:
05:08:22     INFO -  INFO | automation.py | SSL tunnel pid: 1132
05:08:22     INFO -  args: ['C:\\slave\\test\\build\\tests\\bin\\metrotestharness.exe', '-no-remote', '-profile', 'c:\\users\\cltbld~1.t-w\\appdata\\local\\temp\\tmpqdg3wo/', 'about:blank', '-firefoxpath', 'C:\\slave\\test\\build\\application\\firefox\\firefox.exe']
05:08:22     INFO -  INFO | automation.py | Application pid: 3852
05:08:22     INFO -  INFO | metrotestharness.exe | firefoxpath: 'C:\slave\test\build\application\firefox\firefox.exe'
05:08:22     INFO -  INFO | metrotestharness.exe | args: '-no-remote -profile c:\users\cltbld~1.t-w\appdata\local\temp\tmpqdg3wo/ about:blank'
05:08:22     INFO -  INFO | metrotestharness.exe | Launching browser...
05:08:22     INFO -  INFO | metrotestharness.exe | App model id='E4CFE2E6B75AA3A3'
05:08:22     INFO -  INFO | metrotestharness.exe | Harness process id: 3852
05:08:22     INFO -  INFO | metrotestharness.exe | Writing out tests.ini to: 'C:\slave\test\build\application\firefox\tests.ini'
05:08:22  WARNING -  TEST-UNEXPECTED-FAIL | metrotestharness.exe | ActivateApplication result 80270254
05:08:22     INFO -  INFO | metrotestharness.exe | Deleting C:\slave\test\build\application\firefox\tests.ini
05:08:22     INFO -  INFO | automation.py | Application ran for: 0:00:00.668000
05:08:22     INFO -  INFO | zombiecheck | Reading PID log: c:\users\cltbld~1.t-w\appdata\local\temp\tmp7dcfsupidlog
05:08:23     INFO -  SUCCESS: The process with PID 1132 has been terminated.
05:08:23     INFO -  ERROR: The process with PID 4004 could not be terminated.
05:08:23     INFO -  Reason: There is no running instance of the task.
05:08:23     INFO -  SUCCESS: The process with PID 2096 has been terminated.
05:08:23     INFO -  WARNING | leakcheck | refcount logging is off, so leaks can't be detected!
05:08:23     INFO -  INFO | runtests.py | Running tests: end.
05:08:24     INFO - Return code: 0
05:08:24     INFO - TinderboxPrint: mochitest-metro-immersive<br/><em class="testfail">T-FAIL</em>
05:08:24  WARNING - # TBPL WARNING #
05:08:24  WARNING - The mochitest suite: metro-immersive ran with return status: WARNING
05:08:24     INFO - Copying logs to upload dir...
05:08:24     INFO - mkdir: C:\slave\test\build\upload\logs
So now we are back to 0x80270254 is E_APPLICATION_NOT_REGISTERED. Something is different between our manual runs of desktop_unittest.py and automation runs. My guess is it's permissions/accounts related.
One thing I can confirm from the log is that automation is running under the right account, since the temp profile is in cltbld's user data folder:

c:\\users\\cltbld~1.t-w\\appdata\\local\\temp\\tmpqdg3wo/

But it might be running with elevated permissions.

If I can get the admin user/pass and cltbld's pass I can go in and try to reproduce what automation gets via a manual run.
The admin user is root. The password is the same as cltbld.

FTR, I moved the cmd window to the foreground and it passed.

Is it possible to move the CMD windows always to the foreground? or minimize the library window?
(In reply to Armen Zambrano G. [:armenzg] (Release Enginerring) from comment #53)
> The admin user is root. The password is the same as cltbld.
> 
> FTR, I moved the cmd window to the foreground and it passed.
> 
> Is it possible to move the CMD windows always to the foreground? or minimize
> the library window?

If the cmd window isn't in foreground, it can't steal focus. If the Libraries window is in the foreground, explorer has the focus and shouldn't have any issues launching the browser. The error was "not registered", so I'm confused as to why forground status on the cmd window plays into this.
Also, if Libraries is in the forgorund and potentially on top of desktop firefox when we run tests, I wonder how that might effect test or talos runs. We might want to file a bug on that to confirm firefox is fully displayed.
(In reply to Armen Zambrano G. [:armenzg] (Release Enginerring) from comment #53)
> The admin user is root. The password is the same as cltbld.
> 
> FTR, I moved the cmd window to the foreground and it passed.
> 
> Is it possible to move the CMD windows always to the foreground? or minimize
> the library window?

'root' / cltbld's pass didn't authenticate.

I was able to run again via a cltbld command window with the Library window in the foreground and everything worked.

The only part of this I can do manually is testslave.py invoking desktop_unittest.py. Running desktop_unittest.py manually from the desktop works fine.
At the end of the road I believe we reach Process inside of _dumbwin32proc.py:
https://etherpad.mozilla.org/UeEIkhYgKk
http://hg.mozilla.org/build/twisted/file/3bdb54e31023/twisted/internet/_dumbwin32proc.py#l105

For the record, the _dumbwin32proc.py on this slave is slightly different than all the other machines.
This was pointed out on bug 853609#c6.
It only helps buildbot to kill processes. Without it buildbot cannot kill processes.
I fixed it manually as of now (not sure if a reboot will take it away).

I've triggered the task and put the library directory in the front.
Let's see what happens.
It seems to be running.
Let me get out of the machine and wait for another job after the reboot.
I will check what version of _dumbwin32proc.py is on the machine.
After rebooting I had the same problem. _dumbwin32proc.py is patched.

It seems that if I start the task manually it works. 
Unless, starting the task manually, then making the Desktop window be on the foreground is not exactly the same thing.
I did something different this time.

After a reboot, a failed job, I decided to just drag the desktop window a little to the side (just click and drag).
After doing that action the next job succeeded.

FTR, here's where we are failing to activate:
http://mxr.mozilla.org/mozilla-central/source/browser/metro/shell/testing/metrotestharness.cpp#292
Is the error code significant?
Can we add debug information on the log? Like which user we're logged in as? or available privileges?
I have removed rebooting out of the equation.
I'm going to trigger several jobs in a row and see if the first one fail and the following succeed.

After that I can receive suggestions on what to try next as I feel that I'm getting tunnel vision.
It has worked the last few times. I will add rebooting back.
Maybe patching _dumbwin32proc.py worked and I got confused on comment 59?
(In reply to Armen Zambrano G. [:armenzg] (Release Enginerring) from comment #61)
> Is the error code significant?
> Can we add debug information on the log? Like which user we're logged in as?
> or available privileges?

Not much information on it. Google has a note in their code about it similar to our assumption that it gets returned when the browser is not set as the default.

The focus stuff is really weird. If there was a registration problem, you would expect it to happen every time, even when run manually.
(In reply to Armen Zambrano G. [:armenzg] (Release Enginerring) from comment #63)
> It has worked the last few times. I will add rebooting back.
> Maybe patching _dumbwin32proc.py worked and I got confused on comment 59?

or maybe the initial startup on reboot is the problem. What did you have to do to get this machine to work? Go in a play with window focus?
The things that I have done differently are:
* patch _dumbwin32proc.py
* play with Window focus

Out of the last 8 runs 2 have failed.

Do you think we could take a screenshot at the beginning of the run to compare things?

I don't know what to do or what to try.
(In reply to Armen Zambrano G. [:armenzg] (Release Enginerring) from comment #66)
> The things that I have done differently are:
> * patch _dumbwin32proc.py
> * play with Window focus
> 
> Out of the last 8 runs 2 have failed.
> 
> Do you think we could take a screenshot at the beginning of the run to
> compare things?
> 
> I don't know what to do or what to try.

During these runs, what are the steps the slave took? Did it reboot for each run for example? 

Also, do we have logs for the two failures?
I've put the two logs in here:
http://people.mozilla.com/~armenzg/metro

After each job we reboot.

The steps are the same every time.
Checkout mozharness and run scripts/scripts/desktop_unittest.py --cfg unittests/win_unittest.py --mochitest-suite metro-immersive --download-symbols ondemand
which it eventually runs this:
C:\\slave\\test\\build\\venv\\Scripts\\python', '-u', 'C:\\slave\\test\\build\\tests\\mochitest/runtests.py', '--appname=C:\\slave\\test\\build\\application\\firefox\\firefox.exe', '--utility-path=tests/bin', '--extra-profile-file=tests/bin/plugins', '--symbols-path=http://ftp.mozilla.org/pub/mozilla.org/firefox/tinderbox-builds/mozilla-inbound-win32/1368124676/firefox-23.0a1.en-US.win32.crashreporter-symbols.zip', '--certificate-path=tests/certs', '--autorun', '--close-when-done', '--console-level=INFO', '--browser-chrome', '--metro-immersive'

I'm running against the same build.
Q, curious, with your registration scripts, does anything run on startup/login that might delay enough such that it could cause a test run startup failure due to missing registration?

Also, FYI Armen, I'd like to look at the registration on this machine again. However this week I'm in Vancouver for a work week so I won't be able to dig into this more until I get back homes next week.
Flags: needinfo?(q)
(In reply to Jim Mathies [:jimm] from comment #69)

Jim,  Registration should happen before any user action can take place. I can do some to debugging to 100% sure but, it should not be an issue.
Flags: needinfo?(q)
(In reply to Armen Zambrano G. [:armenzg] (Release Enginerring) from comment #66)
> The things that I have done differently are:
> * patch _dumbwin32proc.py
> * play with Window focus
> 
> Out of the last 8 runs 2 have failed.
> 
> Do you think we could take a screenshot at the beginning of the run to
> compare things?
> 
> I don't know what to do or what to try.

So were these ten runs stand alone or did you have to go in and play with the machine? Can we do some sort of automated test where we run the tests - reboot - repeat for ~20 runs without messing with the slave to see what the failure rate is?
(In reply to Jim Mathies [:jimm] from comment #71)
> (In reply to Armen Zambrano G. [:armenzg] (Release Enginerring) from comment
> #66)
> > The things that I have done differently are:
> > * patch _dumbwin32proc.py
> > * play with Window focus
> > 
> > Out of the last 8 runs 2 have failed.
> > 
> > Do you think we could take a screenshot at the beginning of the run to
> > compare things?
> > 
> > I don't know what to do or what to try.
> 
> So were these ten runs stand alone or did you have to go in and play with
> the machine? Can we do some sort of automated test where we run the tests -
> reboot - repeat for ~20 runs without messing with the slave to see what the
> failure rate is?

I did not mess any of those.
I will go and queue a lot of them where the machine would go straight into taking jobs (rather than me triggering a couple of jobs manually every few minutes since there is job coalescing - I will disable it).
To see if this is a timing issue, we could land a debug patch that retries ActivateApplication over a period of 30 seconds or so, and see if it improves the success rate. Do we think this would be useful?
alternatively, is there any way we can delay test startup on this slave for testing purposes? I think there's already an existing 30 second timeout, maybe we could up that to a minute or two to see if it helps?
I can manually adjust the file. I hope PGO would not overwrite it.
Let me try it.
I have bumped the sleep step from 30 to 60 and have re-triggered a lot of consecutive jobs.

We had one more failure: 07:29:52  WARNING -  TEST-UNEXPECTED-FAIL | metrotestharness.exe | ActivateApplication result 80270254

End time of the previous job was 06:52:23 2013
Start time of the failing job was 07:27:11 2013

This means that the machine was up and waiting without taking a job for almost 30 minutes.

Could we dump expected information from the registry at the beginning of the run?
Adding more time before taking a job has not helped.

I would like to see the retrying of ActiveApplication to see if it fixes the issue.
(In reply to Armen Zambrano G. [:armenzg] (Release Enginerring) from comment #76)
> I have bumped the sleep step from 30 to 60 and have re-triggered a lot of
> consecutive jobs.
> 
> We had one more failure: 07:29:52  WARNING -  TEST-UNEXPECTED-FAIL |
> metrotestharness.exe | ActivateApplication result 80270254

One failure out of how many runs? I'm curious how often this happens.

> End time of the previous job was 06:52:23 2013
> Start time of the failing job was 07:27:11 2013
> 
> This means that the machine was up and waiting without taking a job for
> almost 30 minutes.

So the machine was rebooted, ctlbld logged in and it sitting idle for thirty minutes before a test run was initiated?

> Could we dump expected information from the registry at the beginning of the
> run?
> Adding more time before taking a job has not helped.
> 
> I would like to see the retrying of ActiveApplication to see if it fixes the
> issue.

Maybe the next step should be to push runs to the slave until it fails, so we know we have a failed setup, and then go in and take a look at the slave config to try and figure out what's wrong. Maybe we can try to get this set up on Monday once I'm back from my work week.
(In reply to Jim Mathies [:jimm] from comment #77)
> (In reply to Armen Zambrano G. [:armenzg] (Release Enginerring) from comment
> #76)
> > I have bumped the sleep step from 30 to 60 and have re-triggered a lot of
> > consecutive jobs.
> > 
> > We had one more failure: 07:29:52  WARNING -  TEST-UNEXPECTED-FAIL |
> > metrotestharness.exe | ActivateApplication result 80270254
> 
> One failure out of how many runs? I'm curious how often this happens.
If looking since May 10th, 4 failures out of 15 runs.
Not sure if it matters, 2 runs failed in a row, then many success passes and then again 2 failures in a row. Start times and end times in between do not give us any indication of what could be the reason (sitting idle, maybe a day passes between the last job, and on).

> 
> > End time of the previous job was 06:52:23 2013
> > Start time of the failing job was 07:27:11 2013
> > 
> > This means that the machine was up and waiting without taking a job for
> > almost 30 minutes.
> 
> So the machine was rebooted, ctlbld logged in and it sitting idle for thirty
> minutes before a test run was initiated?
> 
Correct.

> > Could we dump expected information from the registry at the beginning of the
> > run?
> > Adding more time before taking a job has not helped.
> > 
> > I would like to see the retrying of ActiveApplication to see if it fixes the
> > issue.
> 
> Maybe the next step should be to push runs to the slave until it fails, so
> we know we have a failed setup, and then go in and take a look at the slave
> config to try and figure out what's wrong. Maybe we can try to get this set
> up on Monday once I'm back from my work week.
Do you mean manually?
> > Maybe the next step should be to push runs to the slave until it fails, so
> > we know we have a failed setup, and then go in and take a look at the slave
> > config to try and figure out what's wrong. Maybe we can try to get this set
> > up on Monday once I'm back from my work week.
>
> Do you mean manually?

Well, I have to default to you on that. We want to look over the config on the slave after an automated test run failure. Things we would want to look at / test:

- inspect browser registration
- manually run using a command console to see if the failure is reproducible.
- maybe initiate another automated run while we're on the machine to look for something unique.
jimm, I could try to hack the code so it prevents the machine from rebooting once it fails. I'm off tomorrow; I could take a stab at it on Friday.
Assignee: q → armenzg
Priority: -- → P1
(In reply to Armen Zambrano G. [:armenzg] (Release Enginerring) from comment #80)
> jimm, I could try to hack the code so it prevents the machine from rebooting
> once it fails. I'm off tomorrow; I could take a stab at it on Friday.

Did you have any luck with this?

I'm back from my work week so I have vpn access again.
(In reply to Jim Mathies [:jimm] from comment #81)
> (In reply to Armen Zambrano G. [:armenzg] (Release Enginerring) from comment
> #80)
> > jimm, I could try to hack the code so it prevents the machine from rebooting
> > once it fails. I'm off tomorrow; I could take a stab at it on Friday.
> 
> Did you have any luck with this?
> 
> I'm back from my work week so I have vpn access again.

I ended up getting derailed adding WinXP ix machines to the production pool.
I'm at my work week this week. I am flying today. I will see what I can do this week about this.
I'm at a work week this week and I am focusing on my breaks to fix the iX test infra refresh project. I don't know how much time I can really spend on this bug this week :S
Attachment #746941 - Flags: feedback?(jmathies)
Hi jimm,
Things are settling down for me.
What would you like me to try? Prevent the machine from rebooting if it fails so we can look at it?
Blocks: 864801
(In reply to Armen Zambrano G. [:armenzg] (Release Enginerring) from comment #84)
> Hi jimm,
> Things are settling down for me.
> What would you like me to try? Prevent the machine from rebooting if it
> fails so we can look at it?

Yes I think that would be a good next step. Lets see if we can reproduce the registration problem after a test run has failed w/out the reboot, so we are testing with the same config the failed test runs under. If we can then it should be pretty easy to diagnose the configuration problem.
How goes the testing armen?
(In reply to Jim Mathies [:jimm] from comment #86)
> How goes the testing armen?

I have not had any luck in figuring out a way of doing this with buildbot easily.
I'm going to disable rebooting and just check every five minutes.
(In reply to Armen Zambrano G. [:armenzg] (Release Enginerring) from comment #87)
> (In reply to Jim Mathies [:jimm] from comment #86)
> > How goes the testing armen?
> 
> I have not had any luck in figuring out a way of doing this with buildbot
> easily.
> I'm going to disable rebooting and just check every five minutes.

Ok, this sound like a good test I think. Basically you're going to let this do multiple test runs for a period of time without a reboot. If we have a working setup and we don't reboot and run multiple test runs, it will be interesting to see if we get registration failures. Is this your plan? (From your posted log I see one successful test run so far.)

Question: on which tree is this running? Cedar? Should we be merging over to cedar to trigger these runs? If so I can do merges all weekend long to trigger lots of them.

If this is your plan, and if we don't get any registration failures on any test run startup, that means we are dealing with some sort of a sporadic reboot config problem, right?
It is not running on any tree but on one machine that I have on staging.

There's also another option.

What about if we deployed Q's change as-is?
I could then enable metro jobs on certain branches and we could re-trigger them if we need to.

Worst comes to worst we have to undo Q's change or modify it.


This is what I'm currently doing:
* trigger job on buildbot
* wait 5 minutes
* if jobs has *not* failed I reboot and trigger another job
* if the jobs fails, I will see it on the log and let you know so you can jump on the machine and have a look
(In reply to Armen Zambrano G. [:armenzg] (Release Enginerring) from comment #90)
> It is not running on any tree but on one machine that I have on staging.
> 
> There's also another option.
> 
> What about if we deployed Q's change as-is?
> I could then enable metro jobs on certain branches and we could re-trigger
> them if we need to.
> 
> Worst comes to worst we have to undo Q's change or modify it.
> 
> 
> This is what I'm currently doing:
> * trigger job on buildbot
> * wait 5 minutes
> * if jobs has *not* failed I reboot and trigger another job
> * if the jobs fails, I will see it on the log and let you know so you can
> jump on the machine and have a look

Ok, this sounds good, lets try this first. I'm hesitant about rolling out major machine config changes that we haven't validated yet even if reverting the change is known to be possible.
On another note, is there a way we can get some info dumped to know why it did not activate? or retry few times?

I have run this 9 times and none of them have failed. Perhaps we could turn the job to be an automatic RETRY and another machine would take the job if we match the activation failure.
(In reply to Armen Zambrano G. [:armenzg] (Release Enginerring) from comment #92)
> On another note, is there a way we can get some info dumped to know why it
> did not activate? or retry few times?

Some things I can do here for debugging purposes:

1) dumping a bunch of registry related info to logs to cross check what the automatic config is supposed to be doing before the user logs in.

2) I can retry the call after a short wait for debugging purposes. This wouldn't be a valid fix though since test runs are on a time out.

3) Validate that target browser files are in place, for example firefox.exe.

I can't really think of much else. 

> I have run this 9 times and none of them have failed.

Hmm, so what do you think is different between what you are doing and what happens when this is automated?

> Perhaps we could turn
> the job to be an automatic RETRY and another machine would take the job if
> we match the activation failure.

I'm not sure what this means, but if the result is our ability to go in and look at a slave setup that has failed - sounds good to me.
(In reply to Jim Mathies [:jimm] from comment #93)
> (In reply to Armen Zambrano G. [:armenzg] (Release Enginerring) from comment
> #92)
> > On another note, is there a way we can get some info dumped to know why it
> > did not activate? or retry few times?
> 
> Some things I can do here for debugging purposes:
> 
> 1) dumping a bunch of registry related info to logs to cross check what the
> automatic config is supposed to be doing before the user logs in.
> 
> 2) I can retry the call after a short wait for debugging purposes. This
> wouldn't be a valid fix though since test runs are on a time out.
> 
> 3) Validate that target browser files are in place, for example firefox.exe.
> 
> I can't really think of much else. 
>
If you could do any of these it would be great.

> 
> > I have run this 9 times and none of them have failed.
> 
> Hmm, so what do you think is different between what you are doing and what
> happens when this is automated?
> 
The only difference is that it does not reboot automatically at the end of the job.
I do the rebooting after I inspect the results.
TBH, the failure has never happened often.

> > Perhaps we could turn
> > the job to be an automatic RETRY and another machine would take the job if
> > we match the activation failure.
> 
> I'm not sure what this means, but if the result is our ability to go in and
> look at a slave setup that has failed - sounds good to me.
>

Have you ever seen a blue job on tbpl? (either blue or purple)
The blue jobs are considered "infra known failures" which automatically re-trigger the job on another machine.
For instance, if there is a network blip, hg is 404 and on.
What I'm suggesting is to make the job automatically re-trigger if we fail to active the app.
> If you could do any of these it would be great.

I'll put something together and we'll get it landed beginning of the week.

> Have you ever seen a blue job on tbpl? (either blue or purple)
> The blue jobs are considered "infra known failures" which automatically
> re-trigger the job on another machine.
> For instance, if there is a network blip, hg is 404 and on.
> What I'm suggesting is to make the job automatically re-trigger if we fail
> to active the app.

Are you suggesting doing this for production and rolling this out? If so that's not a call I can make, since it'll suck releng resources until we get it fixed. We should also be sure we can backout / update Q's work.
(In reply to Jim Mathies [:jimm] from comment #95)
> > If you could do any of these it would be great.
> 
> I'll put something together and we'll get it landed beginning of the week.

Can you work off a try build with this or do you need it checked into the tree?
I can make it work from try.
Hey Arman, is that slave currently free? I'd like to get on it and do some testing.
Ok here's rev 1 with a bunch of registry config debug code in it, plus a five second retry that retries three times. 

https://tbpl.mozilla.org/?tree=Try&rev=ef8aa4c447f6

I just sent it over so should be available in a couple hours.
(In reply to Jim Mathies [:jimm] from comment #99)
> Ok here's rev 1 with a bunch of registry config debug code in it, plus a
> five second retry that retries three times. 
> 
> https://tbpl.mozilla.org/?tree=Try&rev=ef8aa4c447f6
> 
> I just sent it over so should be available in a couple hours.

builds ready - 

http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jmathies@mozilla.com-ef8aa4c447f6/try-win32/
(In reply to Jim Mathies [:jimm] from comment #101)
> (In reply to Jim Mathies [:jimm] from comment #99)
> > Ok here's rev 1 with a bunch of registry config debug code in it, plus a
> > five second retry that retries three times. 
> > 
> > https://tbpl.mozilla.org/?tree=Try&rev=ef8aa4c447f6
> > 
> > I just sent it over so should be available in a couple hours.
> 
> builds ready - 
> 
> http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jmathies@mozilla.
> com-ef8aa4c447f6/try-win32/

I have started running this on staging.
The last job that we run on this machine was on June 3rd.
After that we rebooted a couple of times.
After those reboots the first job out of 13 runs it failed.
The remaining 12 jobs have been succeeding.


08:45:38     INFO - #####
08:45:38     INFO - ##### Running run-tests step.
08:45:38     INFO - #####
08:45:38     INFO - Running pre test command run mouse & screen adjustment script with 'c:\mozilla-build\python27\python.exe ../scripts/external_tools/mouse_and_screen_resolution.py --configuration-url http://hg.mozilla.org/%(repo_path)s/raw-file/%(revision)s/testing/machine-configuration.json'
08:45:38     INFO - Running command: ['c:\\mozilla-build\\python27\\python.exe', '../scripts/external_tools/mouse_and_screen_resolution.py', '--configuration-url', u'http://hg.mozilla.org/integration/mozilla-inbound/raw-file/default/testing/machine-configuration.json'] in C:\slave\test\build
08:45:38     INFO - Copy/paste: c:\mozilla-build\python27\python.exe ../scripts/external_tools/mouse_and_screen_resolution.py --configuration-url http://hg.mozilla.org/integration/mozilla-inbound/raw-file/default/testing/machine-configuration.json
08:45:38     INFO -  INFO: This script was written to be used with Windows 7 32-bit machines.
08:45:38     INFO - Return code: 0
08:45:38     INFO - #### Running mochitest suites
08:45:38     INFO - ENV: MINIDUMP_STACKWALK is now C:\slave\test\build/tools/breakpad/win32/minidump_stackwalk.exe
08:45:38     INFO - ENV: MINIDUMP_SAVE_PATH is now C:\slave\test\build/../minidumps
08:45:38     INFO - Running command: ['C:\\slave\\test\\build\\venv\\Scripts\\python', '-u', 'C:\\slave\\test\\build\\tests\\mochitest/runtests.py', '--appname=C:\\slave\\test\\build\\application\\firefox\\firefox.exe', '--utility-path=tests/bin', '--extra-profile-file=tests/bin/plugins', '--symbols-path=http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jmathies@mozilla.com-ef8aa4c447f6/try-win32/firefox-24.0a1.en-US.win32.crashreporter-symbols.zip', '--certificate-path=tests/certs', '--autorun', '--close-when-done', '--console-level=INFO', '--browser-chrome', '--metro-immersive'] in C:\slave\test\build
08:45:38     INFO - Copy/paste: C:\slave\test\build\venv\Scripts\python -u C:\slave\test\build\tests\mochitest/runtests.py --appname=C:\slave\test\build\application\firefox\firefox.exe --utility-path=tests/bin --extra-profile-file=tests/bin/plugins --symbols-path=http://ftp.mozilla.org/pub/mozilla.org/firefox/try-builds/jmathies@mozilla.com-ef8aa4c447f6/try-win32/firefox-24.0a1.en-US.win32.crashreporter-symbols.zip --certificate-path=tests/certs --autorun --close-when-done --console-level=INFO --browser-chrome --metro-immersive
08:45:39     INFO -  INFO | runtests.py | Installing extension at C:\slave\test\build\tests\mochitest\extensions\specialpowers to c:\users\cltbld~1.t-w\appdata\local\temp\tmptun0sm.
08:45:40     INFO -  INFO | runtests.py | Installing extension at C:\slave\test\build\tests\mochitest\extensions\worker to c:\users\cltbld~1.t-w\appdata\local\temp\tmptun0sm.
08:45:40     INFO -  INFO | runtests.py | Installing extension at C:\slave\test\build\tests\mochitest\extensions\workerbootstrap to c:\users\cltbld~1.t-w\appdata\local\temp\tmptun0sm.
08:45:40     INFO -  args: ['C:\\slave\\test\\build\\tests\\bin\\xpcshell.exe', '-g', 'C:\\slave\\test\\build\\application\\firefox', '-v', '170', '-f', './httpd.js', '-e', "const _PROFILE_PATH = 'c:\\\\users\\\\cltbld~1.t-w\\\\appdata\\\\local\\\\temp\\\\tmptun0sm';const _SERVER_PORT = '8888'; const _SERVER_ADDR = '127.0.0.1';\n                     const _TEST_PREFIX = undefined; const _DISPLAY_RESULTS = false;", '-f', './server.js']
08:45:40     INFO -  INFO | runtests.py | Server pid: 4032
08:45:42     INFO -  args: ['C:\\slave\\test\\build\\venv\\Scripts\\python.exe', 'C:\\slave\\test\\build\\tests\\mochitest\\pywebsocket_wrapper.py', '-p', '9988', '-w', 'C:\\slave\\test\\build\\tests\\mochitest', '-l', 'C:\\slave\\test\\build\\tests\\mochitest\\websock.log', '--log-level=debug', '--allow-handlers-outside-root-dir']
08:45:42     INFO -  INFO | runtests.py | Websocket server pid: 4012
08:45:42     INFO -  INFO | runtests.py | Running tests: start.
08:45:42     INFO -  args: ['C:\\slave\\test\\build\\tests\\bin\\certutil.exe', '-N', '-d', 'c:\\users\\cltbld~1.t-w\\appdata\\local\\temp\\tmptun0sm', '-f', 'c:\\users\\cltbld~1.t-w\\appdata\\local\\temp\\tmptun0sm\\.crtdbpw']
08:45:42     INFO -  args: ['C:\\slave\\test\\build\\tests\\bin\\certutil.exe', '-A', '-i', 'C:\\slave\\test\\build\\tests\\certs\\bug483440-attack2b.ca', '-d', 'c:\\users\\cltbld~1.t-w\\appdata\\local\\temp\\tmptun0sm', '-f', 'c:\\users\\cltbld~1.t-w\\appdata\\local\\temp\\tmptun0sm\\.crtdbpw', '-n', 'bug483440-attack2b', '-t', 'CT,,']
08:45:43     INFO -  args: ['C:\\slave\\test\\build\\tests\\bin\\certutil.exe', '-A', '-i', 'C:\\slave\\test\\build\\tests\\certs\\bug483440-attack7.ca', '-d', 'c:\\users\\cltbld~1.t-w\\appdata\\local\\temp\\tmptun0sm', '-f', 'c:\\users\\cltbld~1.t-w\\appdata\\local\\temp\\tmptun0sm\\.crtdbpw', '-n', 'bug483440-attack7', '-t', 'CT,,']
08:45:43     INFO -  args: ['C:\\slave\\test\\build\\tests\\bin\\certutil.exe', '-A', '-i', 'C:\\slave\\test\\build\\tests\\certs\\bug483440-pk10oflo.ca', '-d', 'c:\\users\\cltbld~1.t-w\\appdata\\local\\temp\\tmptun0sm', '-f', 'c:\\users\\cltbld~1.t-w\\appdata\\local\\temp\\tmptun0sm\\.crtdbpw', '-n', 'bug483440-pk10oflo', '-t', 'CT,,']
08:45:43     INFO -  args: ['C:\\slave\\test\\build\\tests\\bin\\certutil.exe', '-A', '-i', 'C:\\slave\\test\\build\\tests\\certs\\evintermediate.ca', '-d', 'c:\\users\\cltbld~1.t-w\\appdata\\local\\temp\\tmptun0sm', '-f', 'c:\\users\\cltbld~1.t-w\\appdata\\local\\temp\\tmptun0sm\\.crtdbpw', '-n', 'evintermediate', '-t', 'CT,,']
08:45:43     INFO -  args: ['C:\\slave\\test\\build\\tests\\bin\\certutil.exe', '-A', '-i', 'C:\\slave\\test\\build\\tests\\certs\\evroot.ca', '-d', 'c:\\users\\cltbld~1.t-w\\appdata\\local\\temp\\tmptun0sm', '-f', 'c:\\users\\cltbld~1.t-w\\appdata\\local\\temp\\tmptun0sm\\.crtdbpw', '-n', 'evroot', '-t', 'CT,,']
08:45:43     INFO -  args: ['C:\\slave\\test\\build\\tests\\bin\\certutil.exe', '-A', '-i', 'C:\\slave\\test\\build\\tests\\certs\\jartests-object.ca', '-d', 'c:\\users\\cltbld~1.t-w\\appdata\\local\\temp\\tmptun0sm', '-f', 'c:\\users\\cltbld~1.t-w\\appdata\\local\\temp\\tmptun0sm\\.crtdbpw', '-n', 'jartests-object', '-t', 'CT,,CT']
08:45:43     INFO -  args: ['C:\\slave\\test\\build\\tests\\bin\\pk12util.exe', '-i', 'C:\\slave\\test\\build\\tests\\certs\\mochitest.client', '-w', 'c:\\users\\cltbld~1.t-w\\appdata\\local\\temp\\tmptun0sm\\.crtdbpw', '-d', 'c:\\users\\cltbld~1.t-w\\appdata\\local\\temp\\tmptun0sm']
08:45:43     INFO -  C:\slave\test\build\tests\bin\pk12util.exe: PKCS12 IMPORT SUCCESSFUL
08:45:43     INFO -  args: ['C:\\slave\\test\\build\\tests\\bin\\certutil.exe', '-A', '-i', 'C:\\slave\\test\\build\\tests\\certs\\pgoca.ca', '-d', 'c:\\users\\cltbld~1.t-w\\appdata\\local\\temp\\tmptun0sm', '-f', 'c:\\users\\cltbld~1.t-w\\appdata\\local\\temp\\tmptun0sm\\.crtdbpw', '-n', 'pgoca', '-t', 'CT,,']
08:45:43     INFO -  args: ['C:\\slave\\test\\build\\tests\\bin\\ssltunnel.exe', 'c:\\users\\cltbld~1.t-w\\appdata\\local\\temp\\tmptun0sm\\ssltunnel.cfg']
08:45:43     INFO -  INFO | automation.py | SSL tunnel pid: 2660
08:45:43     INFO -  args: ['C:\\slave\\test\\build\\tests\\bin\\metrotestharness.exe', '-no-remote', '-profile', 'c:\\users\\cltbld~1.t-w\\appdata\\local\\temp\\tmptun0sm/', 'about:blank', '-firefoxpath', 'C:\\slave\\test\\build\\application\\firefox\\firefox.exe']
08:45:43     INFO -  INFO | automation.py | Application pid: 3100
08:45:43     INFO -  INFO | metrotestharness.exe | firefoxpath: 'C:\slave\test\build\application\firefox\firefox.exe'
08:45:43     INFO -  INFO | metrotestharness.exe | args: '-no-remote -profile c:\users\cltbld~1.t-w\appdata\local\temp\tmptun0sm/ about:blank'
08:45:43     INFO -  INFO | metrotestharness.exe | Launching browser...
08:45:43     INFO -  INFO | metrotestharness.exe | App model id='E4CFE2E6B75AA3A3'
08:45:43     INFO -  INFO | metrotestharness.exe | Harness process id: 3100
08:45:43     INFO -  INFO | metrotestharness.exe | Writing out tests.ini to: 'C:\slave\test\build\application\firefox\tests.ini'
08:45:43  WARNING -  TEST-UNEXPECTED-FAIL | metrotestharness.exe | ActivateApplication (retry 0) result 80270254
08:45:48  WARNING -  TEST-UNEXPECTED-FAIL | metrotestharness.exe | ActivateApplication (retry 1) result 80270254
08:45:53  WARNING -  TEST-UNEXPECTED-FAIL | metrotestharness.exe | ActivateApplication (retry 2) result 80270254
08:45:58  WARNING -  TEST-UNEXPECTED-FAIL | metrotestharness.exe | ActivateApplication result 80270254
08:45:58     INFO -  INFO | metrotestharness.exe | Deleting C:\slave\test\build\application\firefox\tests.ini
08:45:58     INFO -  INFO | automation.py | Application ran for: 0:00:15.574000
08:45:58     INFO -  INFO | zombiecheck | Reading PID log: c:\users\cltbld~1.t-w\appdata\local\temp\tmp9m_xtopidlog
08:45:59     INFO -  SUCCESS: The process with PID 2660 has been terminated.
08:45:59     INFO -  ERROR: The process with PID 4032 could not be terminated.
08:45:59     INFO -  Reason: There is no running instance of the task.
08:45:59     INFO -  SUCCESS: The process with PID 4012 has been terminated.
08:45:59     INFO -  WARNING | leakcheck | refcount logging is off, so leaks can't be detected!
08:45:59     INFO -  INFO | runtests.py | Running tests: end.
08:46:00     INFO - Return code: 0
08:46:00     INFO - TinderboxPrint: mochitest-metro-immersive<br/><em class="testfail">T-FAIL</em>
08:46:00  WARNING - # TBPL WARNING #
08:46:00  WARNING - The mochitest suite: metro-immersive ran with return status: WARNING
08:46:00     INFO - Copying logs to upload dir...
08:46:00     INFO - mkdir: C:\slave\test\build\upload\logs
program finished with exit code 1
Hi Q,
Is the change that you deployed to this machine something that we can backout easily?

(In reply to Jim Mathies [:jimm] from comment #95) 
> > Have you ever seen a blue job on tbpl? (either blue or purple)
> > The blue jobs are considered "infra known failures" which automatically
> > re-trigger the job on another machine.
> > For instance, if there is a network blip, hg is 404 and on.
> > What I'm suggesting is to make the job automatically re-trigger if we fail
> > to active the app.
> 
> Are you suggesting doing this for production and rolling this out? If so
> that's not a call I can make, since it'll suck releng resources until we get
> it fixed. We should also be sure we can backout / update Q's work.

jimm, I'm OK to sometimes have some of the win8 64-bit machines take a job and have to retry on another machine. It's not a long job. We might loose at most 5-7 minutes every once in a while.
Flags: needinfo?(q)
> jimm, I'm OK to sometimes have some of the win8 64-bit machines take a job
> and have to retry on another machine. It's not a long job. We might loose at
> most 5-7 minutes every once in a while.

Ok sounds good.

From your one test run failure, every registry check succeeded.
(In reply to Q from comment #10)
> Created attachment 746788 [details]
> Reg file converted to GPO usable XML
> 
> This can be directly imported into a registry preference GPO

Q, if these machines get reset on every reboot, we should remove the delete orders in this xml script. There's no point in having them if the slaves are reset and this registration is re-imported every time the user logs in.
One thing I thought of, once the win8 tests finish their run, we exit the browser. But we do not flip back to the desktop. Is this going to fowl up desktop tests that run on these slaves, or do they logout or reboot on every run?
(In reply to Jim Mathies [:jimm] from comment #107)
> One thing I thought of, once the win8 tests finish their run, we exit the
> browser. But we do not flip back to the desktop. Is this going to fowl up
> desktop tests that run on these slaves, or do they logout or reboot on every
> run?

A request for this I have filed as bug 879043 yesterday. For our Mozmill tests it would be kinda helpful to get back to desktop. We haven't run those tests yet in our CI so I don't know how other tests will cope with that.
Blocks: 845079
(In reply to Jim Mathies [:jimm] from comment #107)
> One thing I thought of, once the win8 tests finish their run, we exit the
> browser. But we do not flip back to the desktop. Is this going to fowl up
> desktop tests that run on these slaves, or do they logout or reboot on every
> run?

We always reboot at the end of each run.
(In reply to Armen Zambrano G. [:armenzg] (Release Enginerring) from comment #109)
> (In reply to Jim Mathies [:jimm] from comment #107)
> > One thing I thought of, once the win8 tests finish their run, we exit the
> > browser. But we do not flip back to the desktop. Is this going to fowl up
> > desktop tests that run on these slaves, or do they logout or reboot on every
> > run?
> 
> We always reboot at the end of each run.

Ok, great.
No longer blocks: 845079
Yes we can back out the change with ease.

(In reply to Armen Zambrano G. [:armenzg] (Release Enginerring) from comment #104)
> Hi Q,
> Is the change that you deployed to this machine something that we can
> backout easily?
> 
> (In reply to Jim Mathies [:jimm] from comment #95) 
> > > Have you ever seen a blue job on tbpl? (either blue or purple)
> > > The blue jobs are considered "infra known failures" which automatically
> > > re-trigger the job on another machine.
> > > For instance, if there is a network blip, hg is 404 and on.
> > > What I'm suggesting is to make the job automatically re-trigger if we fail
> > > to active the app.
> > 
> > Are you suggesting doing this for production and rolling this out? If so
> > that's not a call I can make, since it'll suck releng resources until we get
> > it fixed. We should also be sure we can backout / update Q's work.
> 
> jimm, I'm OK to sometimes have some of the win8 64-bit machines take a job
> and have to retry on another machine. It's not a long job. We might loose at
> most 5-7 minutes every once in a while.
Flags: needinfo?(q)
(In reply to Q from comment #111)
> Yes we can back out the change with ease.


Great. Armen, is releng ok to roll this out to slaves and get these running on inbound/mc? We should start them as hidden so we can deal with any random orange.
Let's get to action in bug 864418.
Status: NEW → RESOLVED
Closed: 6 years ago
Resolution: --- → FIXED
Attachment #740974 - Attachment mime type: text/x-ms-regedit → text/plain
Attachment #741288 - Attachment mime type: text/x-ms-regedit → text/plain
OS: Windows 8 Metro → Windows 8.1
You need to log in before you can comment on or make changes to this bug.