Closed
Bug 1343049
Opened 8 years ago
Closed 7 years ago
The OS theme isn't consistent on Windows 7 test machines
Categories
(Taskcluster :: Workers, defect, P2)
Taskcluster
Workers
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: MattN, Assigned: grenade)
References
(Blocks 1 open bug)
Details
Attachments
(2 files)
Some machines seem to use Aero Glass while others use Aero Basic. Not only does this change the apeparance of the window chrome, it also seems to also affect the font rendering and I suspect causes intermittent failures e.g. on reftests. For mozscreenshots this causes false changes to be reported since the window chrome is intentionally captured.
What's the best way to proceed?
Is the OS automatically switching to the basic theme due to its heuristics or is something else causing this?
If not, should I expect a more consistent OS theme on TaskCluster if I move M(ss) jobs to it from BB?
Can we force the OS theme back to Aero Glass regularly or set some registry key(s) so the OS doesn't automatically switch this stuff?
| Reporter | ||
Comment 1•8 years ago
|
||
| Comment hidden (off-topic) |
| Comment hidden (off-topic) |
| Reporter | ||
Comment 5•8 years ago
|
||
Johann has been reviewing screenshots more recently. Johann, have you seen the window controls switch between aero basic and glass recently?
| Reporter | ||
Updated•8 years ago
|
Flags: needinfo?(jhofmann)
Comment 6•8 years ago
|
||
Yes, it's definitely still an issue. Recent example I happened to have open in my tabs: https://screenshots.mattn.ca/compare/?oldProject=mozilla-central&oldRev=f223e1fd2044a026c740434df95f37a7f7accf48&newProject=mozilla-central&newRev=6491fb29e7fcc9e02cc179ae2856f426a3552385&filter=windows7
Flags: needinfo?(jhofmann)
Flags: needinfo?(MattN+bmo)
Comment 7•8 years ago
|
||
:grenade: can you please help take a look at this?
From discussion on irc, this appears to definitely be a problem on buildbot hardware, but may also be an issue in AWS (I think you would need to compare to see if it also impacts TC).
I'd like to timebox this so if you can give an estimation of how long it will take for the initial investigation, that would be helpful.
Updated•8 years ago
|
Flags: needinfo?(rthijssen)
Updated•8 years ago
|
Assignee: nobody → relops
Component: General Automation → RelOps
Product: Release Engineering → Infrastructure & Operations
QA Contact: catlee → arich
| Reporter | ||
Comment 8•8 years ago
|
||
Example job with the issue:
https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=6491fb29e7fcc9e02cc179ae2856f426a3552385&filter-tier=1&filter-tier=2&filter-tier=3&exclusion_profile=false&filter-searchStr=ss&selectedJob=105539896
see:
> artifact uploaded: 20170608073214-controlCenter_22_darkLWT_about.png
and notice that it has incorrect window controls
Job: Windows 7 pgo Mochitest Mochitest Browser Screenshots M(ss)
Machine: t-w732-ix-112
Updated•8 years ago
|
Assignee: relops → rthijssen
| Reporter | ||
Comment 9•8 years ago
|
||
Two more examples on the same push on two different VM types (t- and g-):
https://treeherder.mozilla.org/#/jobs?repo=try&revision=ca362eb7f7464a57137e7b257776a44779dfadf1&filter-tier=1&filter-tier=2&filter-tier=3&exclusion_profile=false&group_state=expanded&selectedJob=107224773&filter-searchStr=Windows%207%20VM%20opt%20Mochitest%20Mochitest%20Browser%20Screenshots%20M(ss)
See
> artifact uploaded: 20170614222407-primaryUI_102_tabsOutsideTitlebar_twoPinnedWithOverflow_normal_allToolbars_lightLWT.png
Both are Windows 7 VM opt Mochitest Mochitest Browser Screenshots M(ss)
t-w732-spot-572:
http://mozilla-releng-blobs.s3.amazonaws.com/blobs/try/sha512/92a037ccd2d01ab9d53a385aeb6ff3c1fe8af8de8a5bc57d04d3d4dc8aefe3d2f5a96722a814a2589309e51f82a0914dc429fc563d491fd1118194b9679a2265
g-w732-spot-167:
http://mozilla-releng-blobs.s3.amazonaws.com/blobs/try/sha512/255b43d8a3222dd8131e8053de29c0e91fd22718d0221f80ac30b13c16960e208b9b87cee2ced2279727f19cb188b0049ddf146dd8694f2c8827ea93d4b8accf
| Reporter | ||
Comment 10•8 years ago
|
||
(In reply to Matthew N. [:MattN] (huge backlog; PM if requests are blocking you) from comment #0)
> Is the OS automatically switching to the basic theme due to its heuristics
> or is something else causing this?
Btw. I really think it could be as simple as disabling this option that Windows has had for years: http://3.bp.blogspot.com/-5oNn8FcFZmY/VHHsQ9iBjiI/AAAAAAAAAvw/YWWjsNeCVbw/s1600/Adjust%2Bfor%2BBest%2BPeformance.png
I'm not 100% sure if Aero Glass is affected by that but we definitely shouldn't have that set to "Let Windows Choose…" (not sure if it's the case).
https://techjourney.net/force-enable-windows-aero-glass-or-wdm-on-unsupported-and-wddm-incomptible-graphics-card/ gives a registry key to force Aero Glass
| Reporter | ||
Comment 11•8 years ago
|
||
I wouldn't be surprised if this also causes intermittent failures on reftests.
| Assignee | ||
Comment 12•8 years ago
|
||
i think the discrepancy between systems is just that windows turns it on when the nvidia driver and gpu is available and off when we're using the basic display adaptor because we haven't got a gpu. if so, it's unlikely to produce intermittent failures since different tests run on different instance types. there's probably a good argument for leaving aero on, when running gpu related tests since the majority of users that have a gpu available, are likely to be making use of the gpu with aero on.
having said that, if we really want to switch it off and don't think it will break gpu tests on win 7, just run:
net stop uxsms
Flags: needinfo?(rthijssen)
| Assignee | ||
Comment 13•8 years ago
|
||
feel free to reopen if this is still an issue
Status: NEW → RESOLVED
Closed: 8 years ago
Resolution: --- → WONTFIX
Updated•7 years ago
|
Blocks: mozscreenshots-inconsistencies
| Reporter | ||
Comment 14•7 years ago
|
||
A similar issue happened again today but this time even worse because the it's wasn't just Aero disabled, it was using the classic theme on Win7:
Image comparison: https://screenshots.mattn.ca/compare/?oldProject=mozilla-central&oldRev=c2fe4b3b1b930b3e7fdb84eae44cec165394f322&newProject=mozilla-central&newRev=597025d8888fa91b9418231f33e65424d384d83f
Before job: https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=c2fe4b3b1b930b3e7fdb84eae44cec165394f322&filter-tier=1&filter-tier=2&filter-tier=3&exclusion_profile=false&filter-searchStr=screenshot&selectedJob=142574124
After job: https://treeherder.mozilla.org/#/jobs?repo=mozilla-central&revision=597025d8888fa91b9418231f33e65424d384d83f&filter-tier=1&filter-tier=2&filter-tier=3&exclusion_profile=false&filter-searchStr=screenshot&selectedJob=142687617
Job definition: https://dxr.mozilla.org/mozilla-central/rev/2535bad09d720e71a982f3f70dd6925f66ab8ec7/taskcluster/ci/test/mochitest.yml#123-139
Is there something about the job definition that we could change to prevent this?
| Reporter | ||
Updated•7 years ago
|
Flags: needinfo?(rthijssen)
| Reporter | ||
Updated•7 years ago
|
Status: RESOLVED → REOPENED
Resolution: WONTFIX → ---
| Assignee | ||
Comment 15•7 years ago
|
||
we don't attempt to set the theme in infra on taskcluster test instances so not sure why the theme has changed to classic.
on windows 10, we hide the taskbar completely (see bug 1373178). perhaps it would make sense to do the same on windows 7? if this is wanted, it could be accomplished with a patch like this in-tree:
https://hg.mozilla.org/try/rev/1969cd365764d698be50696d9c60f9f3fe744d59
Flags: needinfo?(rthijssen)
| Assignee | ||
Updated•7 years ago
|
Assignee: rthijssen → relops
what I am concerned about is that we have different environments (could be theme, taskbar, resolution, dpi, etc.) which can influence the test results
| Assignee | ||
Comment 17•7 years ago
|
||
agreed but we don't attempt to change theme, taskbar, resolution or dpi from within infra. mostly because those settings are user specific and involve changing registry settings under HKEY_CURRENT_USER.
in occ and infra, we change lots of *system* specific settings which are stored under HKEY_LOCAL_MACHINE. but because generic-worker is designed to run every task under a new user and create that user, after occ has completed and handed control to generic worker) occ has no way to modify the user specific settings.
since occ doesn't attempt to change these user settings and has no control over them (the user account is created by generic worker, not occ), i assume that something in a test run has caused the theme to revert to classic. it's not something occ can influence.
if we want a specific theme, then the test preflight script should explicitly set that theme in-tree (testing/mozharness/configs/unittests/win_taskcluster_unittest.py).
since i don't know off hand, how to specify a theme from the command line, i'm suggesting just auto-hiding the task bar which is the portion of the screenshot which differs from the base screenshot. it also makes sense to match behaviours across windows 7 and 10 and since we already hide the task bar on win 10 and doing so on 7 could potentially fix the issue in this bug, i'm suggesting it as a fix.
a more thorough fix might be to work out how to explicitly set the theme from the command line and also add that to the pre-flight script.
| Reporter | ||
Comment 18•7 years ago
|
||
I agree the taskbar should also be hidden (I actually thought that it was supposed to be already) but it's the least of my worries since it's trivial to crop out of the images. The window controls and toolbar colors are a core part of the browser window and so we shouldn't have any differences in them between runs: https://screenshots.mattn.ca/comparisons/mozilla-central/c2fe4b3b1b930b3e7fdb84eae44cec165394f322/mozilla-central/597025d8888fa91b9418231f33e65424d384d83f/windows7-32/primaryUI_001_tabsInTitlebar_fiveTabs_maximized_onlyNavBar_noLWT.png
You'll see even the scrollbar size changes between themes.
If you want to set the theme programmatically I have hacky scripts for that (which I don't use in automation):
* Vista/7 (untested on 8/10): https://github.com/mnoorenberghe/mozscreenshots/blob/master/mozscreenshots/extension/lib/ChangeWindowsTheme.bat
** example caller: https://github.com/mnoorenberghe/mozscreenshots/blob/fe181bc693edb4bb152e709a69b22277514d3e21/mozscreenshots/extension/configurations/SystemTheme.jsm#L87
| Assignee | ||
Comment 19•7 years ago
|
||
while implementing a preflight script to set the theme, i discovered that the theme is not set to classic at the start of a task (at least on the instances that picked up my tasks. something must be occuring during the test run to cause the theme to change. or possibly windows is falling back to a classic theme for some other reason.
here's some tasks that demonstrate that the gpu and non gpu vm's have the same theme settings and identical screenshots when the task hasn't run a test. these tasks simply log the current theme setting and take a screenshot (which doesn't like the classic theme screenshots harvested by the test failures).
-gecko-t-win7-32:
CurrentTheme registry key is set to C:\windows\resources\Ease of Access Themes\basic.theme
https://tools.taskcluster.net/groups/Ae0RftpvRWaLG1pl3HG1HA/tasks/Ae0RftpvRWaLG1pl3HG1HA/runs/0/logs/public%2Flogs%2Flive.log
https://public-artifacts.taskcluster.net/Ae0RftpvRWaLG1pl3HG1HA/0/public/desktop.png
-gecko-t-win7-32-gpu:
CurrentTheme registry key is set to C:\windows\resources\Ease of Access Themes\basic.theme
https://tools.taskcluster.net/groups/dEuuaJGPQ-GC-cyc0QjIkA/tasks/dEuuaJGPQ-GC-cyc0QjIkA/runs/0/logs/public%2Flogs%2Flive.log
https://public-artifacts.taskcluster.net/dEuuaJGPQ-GC-cyc0QjIkA/0/public/desktop.png
in summary, i don't know what we can do on infra to fix this issue. the theme appears to be correctly set when the task starts (we don't actually set it in infra, it just happens to be correct by default).
i guess we need to understand:
- do the tests do something to change the theme?
- is windows falling back to a classic theme during the test run for some other reason?
| Reporter | ||
Comment 20•7 years ago
|
||
(In reply to Rob Thijssen (:grenade - UTC+2) from comment #19)
> i guess we need to understand:
> - do the tests do something to change the theme?
The M-TC-e10s(ss) jobs don't do anything like that. Is the same account shared between multiple jobs on Win7?
> - is windows falling back to a classic theme during the test run for some
> other reason?
Probably. If I had to guess I would say the problem may be related to the setting in this dialog: https://cl.ly/2x0e2c0x0r0L
We shouldn't let Windows choose as that will cause random changes. Can we force the setting in the registry? It looks like it's possible in HKCU: https://www.experts-exchange.com/questions/21494543/Registry-tweak-for-System-Properties-Advanced-Performance-Settings-Visual-Effects-Adjust-for-Best-Performance.html#a14464460
I think we would want Best Appearance
| Assignee | ||
Comment 21•7 years ago
|
||
testing now:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=4d5a29436b9d20198f8a9ee04eb231c64930a61c&group_state=expanded
Assignee: relops → rthijssen
| Assignee | ||
Comment 22•7 years ago
|
||
looking at the failures here:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=5d50c80700f9c390bbae8f9c2658f0480926ac34&filter-searchStr=windows7&filter-resultStatus=testfailed&filter-resultStatus=busted&filter-resultStatus=exception&filter-resultStatus=runnable
the screenshots from the test failures show the theme patch appears to work, however i haven't figured out how to trigger the browser-screenshots (ss) tests on try and would prefer to see that those too have the correct theme with this patch.
how do we trigger browser-screenshots tests on try?
i tried without success:
- try: -b do -p win32,win64 -u browser-screenshots -t none
- try: -b do -p win32,win64 -u browser-screenshots-e10s -t none
- try: -b do -p win32,win64 -u ss -t none
- try: -b do -p win32,win64 -u all -t none
Flags: needinfo?(MattN+bmo)
| Assignee | ||
Comment 23•7 years ago
|
||
think i found my answer using the trychooser (try: -b do -p win32 -u browser-screenshots-e10s[Windows 7] -t none)
Flags: needinfo?(MattN+bmo)
| Reporter | ||
Comment 24•7 years ago
|
||
Correct, you need the platforms for the test to be specified. See https://developer.mozilla.org/en-US/docs/Mozilla/QA/Browser_screenshots
| Comment hidden (mozreview-request) |
| Comment hidden (mozreview-request) |
| Reporter | ||
Comment 27•7 years ago
|
||
| mozreview-review | ||
Comment on attachment 8928841 [details]
Bug 1343049 - prevent windows theme change.
https://reviewboard.mozilla.org/r/200126/#review205264
Thanks a lot! Let's give this a try, it can't hurt :)
Attachment #8928841 -
Flags: review?(MattN+bmo) → review+
| Assignee | ||
Updated•7 years ago
|
Keywords: checkin-needed
| Comment hidden (mozreview-request) |
| Assignee | ||
Comment 30•7 years ago
|
||
for some reason, my first implementation (using `reg add` cmd) was failing occasionally.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=090f06acc5f808b0471d3be3d060c0f7b70eed6e
https://treeherder.mozilla.org/#/jobs?repo=try&revision=ec513c836975c857fa1094339923d78be9e3faeb
i refactored it to run the registry change with powershell commands instead and to restart explorer. that seems to work everywhere.
https://treeherder.mozilla.org/#/jobs?repo=try&revision=f0faf974807178c65f6294a2c8ee88ac1d99531d
https://treeherder.mozilla.org/#/jobs?repo=try&revision=c50eda744ff8da62d2df8d41c6e44672b0e69984
| Assignee | ||
Updated•7 years ago
|
Keywords: checkin-needed
Comment 31•7 years ago
|
||
Pushed by nbeleuzu@mozilla.com:
https://hg.mozilla.org/integration/autoland/rev/2d9ecef9d58f
prevent windows theme change. r=MattN
Keywords: checkin-needed
Comment 32•7 years ago
|
||
| bugherder | ||
Status: REOPENED → RESOLVED
Closed: 8 years ago → 7 years ago
Resolution: --- → FIXED
Blocks: 1417345
Blocks: 1411490
as mentioned in https://bugzilla.mozilla.org/show_bug.cgi?id=1411490#c16, I believe this isn't fixed. We need to provide a 100% reliable environment for testing. Pete, I see Rob is on PTO, can you look into this?
Status: RESOLVED → REOPENED
Flags: needinfo?(pmoore)
Resolution: FIXED → ---
:gbrown, do you have thoughts on how to detect this in all our harnesses? I would like to have a 'preflight' check in our harnesses to ensure that we have the windowsDefaultTheme set for windows- maybe mozharness is the right place. If this is the case we can fail and it would be a clear infrastructure failures.
Flags: needinfo?(gbrown)
Comment 35•7 years ago
|
||
This would quite probably be solved by bug 1399401 since then we are not reusing a logon session between tasks - instead we perform a full winlogon for each task. In generic-worker version 8 running on windows 7, we create new desktops, but the underlying winlogon session is the same.
Depends on: 1399401
Flags: needinfo?(pmoore)
Comment 36•7 years ago
|
||
That still seems like we're making assumptions about what the theme will be rather than being explicit about it, no?
Comment 37•7 years ago
|
||
(In reply to Joel Maher ( :jmaher) (UTC-5) from comment #34)
> :gbrown, do you have thoughts on how to detect this in all our harnesses? I
> would like to have a 'preflight' check in our harnesses to ensure that we
> have the windowsDefaultTheme set for windows- maybe mozharness is the right
> place. If this is the case we can fail and it would be a clear
> infrastructure failures.
A mozharness check might be appropriate. Perhaps hook into an existing "preflight" handler in TestingMixin? Obviously it would be guarded by an _is_windows(). But I don't know how to check for windowsDefaultTheme from mozharness/python.
Flags: needinfo?(gbrown)
:emilio do you know how to determine what windowsDefaultTheme is (i.e. nsGkAtoms::windows_default_theme)? I would like to query this either outside of Firefox ideally.
Flags: needinfo?(emilio)
Comment 39•7 years ago
|
||
I think this is what you're looking for: https://searchfox.org/mozilla-central/rev/7f45cb7cc0229398fc99849bdc150cb6462d6966/widget/windows/nsUXThemeData.cpp#332
Flags: needinfo?(emilio)
Comment 40•7 years ago
|
||
(In reply to Kendall Libby [:fubar] from comment #36)
> That still seems like we're making assumptions about what the theme will be
> rather than being explicit about it, no?
Agreed.
Joel, should the test setup not ensure the correct theme is selected?
Flags: needinfo?(jmaher)
Pete, I think we could do that- in the last 10+ years we haven't done that, our machines have been consistent. One issue here is we need to fix this in many harnesses (even mozharness is split into 3 or 4 different scripts, test harnesses are 8+). In terms of where things should break down, we should assume the OS is stable, the test harness should just run tests and make sure that firefox bits, prefs, env are setup properly.
Mozharness is sort of a grey area- Can we not make a default theme inside of taskcluster worker?
Pre taskcluster we would run the OS and not reboot it all the time and keep the logged in session- we never had issues with the theme going away. I am concerned here as our current taskcluster worker solution keeps a session open for multiple test runs and somehow we end up with different themes- to me I don't see this getting fixed with logging in with fresh users.
Flags: needinfo?(jmaher)
I am working on some mozharness checks in bug 1423658
Mark/Rob- this issue for windows7 has been ongoing for the last few months- I have seen a lot of fixes, but there is always another small issue that sneaks in. When we get back from the break, could one of you look into this? I would like to land my code in bug 1423658 to check that aero is enabled before running tests, but that will cause so many failures it would be counterproductive :(
Flags: needinfo?(rthijssen)
Flags: needinfo?(mcornmesser)
| Assignee | ||
Comment 44•7 years ago
|
||
happy to look into this but would like some clarification on what the problem is are we still seeing screenshots with windows classic theme?
the patch which landed in comment 32 runs a preflight command which sets the os theme to the default for "best appearance" which is supposed to ensure that aero is enabled and not use the classic theme. it was intended to fix the reported issue of screenshots containing classic theme when aero is expected.
if we're still seeing screenshots with classic theme can we get some links to tests that contain these (so that i know how to replicate) i was under the impression it was fixed so just need to know how to reproduce if it's not.
Flags: needinfo?(rthijssen)
Updated•7 years ago
|
Flags: needinfo?(mcornmesser)
here is an example of a push (from today with recent code):
https://treeherder.mozilla.org/#/jobs?repo=try&revision=233e41a5d2acb63109c1e5fe67e55128b6de83c8&filter-tier=1&filter-tier=2&filter-tier=3
there are a couple failures and you can see that they all have failed OS themes, here are some direct links to some failure logs:
https://public-artifacts.taskcluster.net/XMBCS70BSPqYqAzygKIzvA/0/public/logs/live_backing.log
https://public-artifacts.taskcluster.net/V8NCGrvoSbmhFx6zkT4j5A/0/public/logs/live_backing.log
https://public-artifacts.taskcluster.net/WXrN2ycOSma6Hxm-v9wVGg/0/public/logs/live_backing.log
https://public-artifacts.taskcluster.net/Nse4tcg2Qsm56EZYJ-KZYA/0/public/logs/live_backing.log
:grenade, let me know if you need more information from me to figure out what is going on. I am happy to fiddle with more try pushes!
Flags: needinfo?(rthijssen)
| Assignee | ||
Comment 46•7 years ago
|
||
all of the failures are triggered by new code from the try push. eg: when `ctypes.windll.user32.SystemParametersInfoA(SPI_GETHIGHCONTRAST, 0, ctypes.byref(highContrastInfo), 0)` returns nothing.
the check for SPI_GETHIGHCONTRAST (in the try push) is run before the attempt to set the default theme to aero. if it is a good way to determine that aero is running, it should at least be run after the preflight scripts that ensure windows is configured for best appearance. otherwise we're simply failing because we're running the check before we've attempted to turn aero on. if we were to check after, we might get better results.
i still have some concerns about using SPI_GETHIGHCONTRAST to determine if aero is running. are we pretty sure this is a valid way to check that aero is running? i was under the impression that the high contrast settings were for visually impaired users to get higher contrast themes...
Flags: needinfo?(rthijssen)
I pulled that code from what we do in C++ to check for the theme:
https://searchfox.org/mozilla-central/source/widget/windows/nsUXThemeData.cpp#340
My try push fails when it is not able to find the Aero theme- I assume we could look a 30 random win7 reftest runs and search for isDefaultTheme:False, and we would find a few of those.
My python->c++ is not guaranteed to be accurate, but when 90% of the jobs run and pass my test, it is a good indication that my code is detecting something off- possibly it would find less issues if we did do this in a different spot in the code- although we should have the OS theme fully set prior to running our mozharness code.
Would you like me to adjust my patch at all?
here are some failures from last week where "windowsDefaultTheme":false:
https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-inbound&job_id=153474367&lineNumber=2836
https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-inbound&job_id=153482013&lineNumber=2778
https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-inbound&job_id=153585302&lineNumber=2781
https://treeherder.mozilla.org/logviewer.html#?repo=mozilla-central&job_id=153603116&lineNumber=2763
those are all from bug 1411490.
Basically if the browser is not able to detect that Aero is running, we shouldn't be running tests.
| Assignee | ||
Comment 49•7 years ago
|
||
yes, agreed. i'd misread some of the logic in the patch too. we do need to run the aero check later in the preflight scripts collection (after the exploroer restart) in order to let the "best appearance" preflight script take effect. i'm beginning to wonder if a delay is also required (a few seconds) to allow the theme change to take effect. because the error suggests that no theme is set (ERROR: No theme to be found: ''), suggesting perhaps that the theme change is in progress? i'm only guessing but the blank theme setting seems odd.
:grenade, could you experiment with setting a 5 second delay after setting the theme before moving on? I could add a 5 second delay before checking for the theme as a way to validate something like that...do you think 5 seconds is enough time?
I added a time.sleep(5) in my script before checking for the Aero theme, and I still get failures:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=d55925836396c39e566c56cc9f2da8b9349e3fa2
this tells me either:
1) we need more time for the theme to get set
2) we cannot reliably set the theme for a session
3) there is a bug and the theme fails to get set and we need to make that more foolproof
4) probably other things that I am not seeing
:grenade, can you take a look and weigh in here?
Flags: needinfo?(rthijssen)
| Assignee | ||
Comment 52•7 years ago
|
||
testing the same patch but with the aero check moved to the end of the preflight scripts (after aero is set in preflight script 'set windows VisualFX' and the explorer restart)...
https://treeherder.mozilla.org/#/jobs?repo=try&revision=2a389623bf0bd39a2ca3b58bf2417046e34c5cc6
Flags: needinfo?(rthijssen)
| Assignee | ||
Comment 53•7 years ago
|
||
ok. that seemed to work (had to re-push without -artifact for some reason):
https://treeherder.mozilla.org/#/jobs?repo=try&revision=cfa056e566cd940289c14015a08fae756a262c33
tests still fail, but the check for aero succeeds when it is run after the 'set windows VisualFX' preflight script and with the 5 second delay.
actually the patch has enable:False, so the test isn't run, there are some JMAHER print statements in the python script that I do not see in the log files to confirm this.
| Assignee | ||
Comment 55•7 years ago
|
||
no. the patch linked in comment 53 has enable set to True:
https://hg.mozilla.org/try/rev/cfa056e566cd940289c14015a08fae756a262c33#l1.21
the jmaher log line is here:
https://treeherder.mozilla.org/logviewer.html#?job_id=154360133&repo=try&lineNumber=1559
i did several other pushes while testing, including some where the aero check was disabled but i didn't link to them from this bug. they'd only show up in my try push history.
i stand by my earlier comments that if the aero check is run after we turn aero on and with a delay of a few seconds to allow aero to init, the aero check succeeds.
here is a try push after moving the check to the end- you can see 34 failures:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=5d321503d3579938c26f2ad7d3592b8df343b9f7
that is 1.4% of the total jobs- so this still happens frequently- I would be happy to add this check so we stop getting confusing reftest failure messages and can see what our overall failure rate is- the concern with doing this is that this could be by far our #1 failure on the tree and we would need some commitment to fixing this in the short term.
| Assignee | ||
Comment 57•7 years ago
|
||
thank you. i hadn't tried a large enough set of tests to see that the problem persists. i was mistaken in my thinking.
i don't know what to do at this stage.
i've already doing everything i can to ensure that aero is set as the default theme. i don't know how to do more than that.
your aero check proves that despite configuring aero as the default theme, the os still reverts to some theme that is not aero or some unthemed configuration in at least some percentage of tests. i don't know why this happens. i don't know how to prevent it from happening.
if anyone can shed light on why aero might stop working or what we can do to keep it working, i'm happy to implement any suggested fix, but i honestly have no idea where to go at this point. i'm stumped.
can we determine if the failures are on machines that are fresh installs vs been around for a while?
Comment 59•7 years ago
|
||
In jmaher's try push, I notice all the aero-check failures seem to be preceded by failures in the powershell commands, and I don't see that discussed here. Worth noting?
https://public-artifacts.taskcluster.net/Ai5JHLbUShG40gC-o4l8bg/0/public/logs/live_backing.log
5:55:19 INFO - Running command: ('powershell', '-command', '"&{&Set-ItemProperty -Path \'HKCU:Software\\Microsoft\\Windows\\CurrentVersion\\Explorer\\VisualEffects\' -Name VisualFXSetting -Value 1}"') in Z:\task_1515253831\build
15:55:19 INFO - Copy/paste: powershell -command "\"&{&Set-ItemProperty -Path 'HKCU:Software\Microsoft\Windows\CurrentVersion\Explorer\VisualEffects' -Name VisualFXSetting -Value 1}\""
15:55:20 INFO - Set-ItemProperty : Cannot find path 'HKCU:\Control Panel\Cursors\' because it
15:55:20 INFO - does not exist.
15:55:20 INFO - At
15:55:20 INFO - C:\windows\System32\WindowsPowerShell\v1.0\Microsoft.PowerShell_profile.ps1:29
15:55:20 INFO - char:1
15:55:20 INFO - + Set-ItemProperty 'HKCU:\Control Panel\Cursors\' -Type 'String' -Name ...
15:55:20 INFO - + ~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
15:55:20 INFO - + CategoryInfo : ObjectNotFound: (HKCU:\Control Panel\Cursors\:St
15:55:20 INFO - ring) [Set-ItemProperty], ItemNotFoundException
15:55:20 INFO - + FullyQualifiedErrorId : PathNotFound,Microsoft.PowerShell.Commands.SetIt
15:55:20 INFO - emPropertyCommand
15:55:21 INFO - &{&Set-ItemProperty -Path 'HKCU:Software\Microsoft\Windows\CurrentVersion\Explorer\VisualEffects' -Name VisualFXSetting -Value 1}
15:55:21 INFO - Return code: 0
What is this telling us? Why would HKCU:\Control Panel\Cursors\ not exist??
Comment 60•7 years ago
|
||
Also, I find myself wondering if getting a screenshot in the failure case would show anything interesting. Has anyone tried? screenshot.exe should be available...but probably only after mozharness download_and_extract?
| Assignee | ||
Comment 61•7 years ago
|
||
(In reply to Geoff Brown [:gbrown] from comment #59)
> What is this telling us? Why would HKCU:\Control Panel\Cursors\ not exist??
nice catch! HKCU is HKEY_CURRENT_USER. the fact it's missing suggests that the user registry hive has failed to load. i will investigate this possibility.
(In reply to Geoff Brown [:gbrown] from comment #60)
> Also, I find myself wondering if getting a screenshot in the failure case
> would show anything interesting. Has anyone tried? screenshot.exe should be
> available...but probably only after mozharness download_and_extract?
i have tried to get screenshots with `nircmd savescreenshot .\build\blobber_upload_dir\is-aero-running.png`. the command succeeds, but for some reason the artifact isn't uploaded. i don't know if i've got the path wrong or if there's something amiss with the artifact upload mechanism (https://dxr.mozilla.org/mozilla-central/source/taskcluster/taskgraph/transforms/job/mozharness_test.py#233).
see here for an example of how i've tried to get a screenshot. maybe it can be improved on:
https://hg.mozilla.org/try/rev/7524678918fb8d0970e2265fc6659473eae9fdd2#l1.13
Comment 62•7 years ago
|
||
(In reply to Rob Thijssen (:grenade UTC+2) from comment #61)
> see here for an example of how i've tried to get a screenshot. maybe it can
> be improved on:
> https://hg.mozilla.org/try/rev/7524678918fb8d0970e2265fc6659473eae9fdd2#l1.13
Thanks -- that almost worked except the upload directory did not exist at that time!
I managed to get screenshots in
https://treeherder.mozilla.org/#/jobs?repo=try&revision=ce8e972d0f8f8a41c940441d4ee169a4b594fd1d
(my changes to the screenshot filename may not be necessary - just speculative).
Comment 63•7 years ago
|
||
Screenshots + theme failures:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=6d4f939e3d0f1e2fa978975d8bf2ec7f47e59fa6
| Reporter | ||
Comment 64•7 years ago
|
||
The failure screenshots show the "Windows Classic" theme, not "Windows 7" (aero) or "Windows 7 Basic" btw so it's possible that patch that landed has prevented falling back to Basic or it just didn't show up on those jobs.
(In reply to Pete Moore [:pmoore][:pete] (on PTO - back 19 January) from comment #35)
> This would quite probably be solved by bug 1399401 since then we are not
> reusing a logon session between tasks - instead we perform a full winlogon
> for each task. In generic-worker version 8 running on windows 7, we create
> new desktops, but the underlying winlogon session is the same.
(In reply to Kendall Libby [:fubar] from comment #36)
> That still seems like we're making assumptions about what the theme will be
> rather than being explicit about it, no?
I think this may be the best solution at this point since the theme can be specified for new accounts via GPO. See "Load a specific theme" at https://docs.microsoft.com/en-us/previous-versions/windows/it-pro/windows-7/ee617164(v=ws.10)
That above combined with the following (also in GPO listed on that page) should ensure a consistent desktop environment:
* Prevent changing theme
* Prevent changing visual style for windows and buttons
* Prevent changing window color and appearance
* Prevent changing desktop background
* Prevent changing desktop icons
* Prevent changing mouse pointers
* Prevent changing screen saver
* Prevent changing sounds (optional)
| Reporter | ||
Comment 65•7 years ago
|
||
Apparently there is also a "Force a specific visual style or force Windows Classic" policy where you can give the path to the Aero theme: http://windows7themes.net/en-us/windows-7-theme-group-policy/
| Assignee | ||
Comment 66•7 years ago
|
||
i added a check to the script that launches generic-worker to check for the existence of the registry key HKCU\Software\Microsoft\Windows\CurrentVersion\Explorer\VisualEffects:
https://github.com/mozilla-releng/OpenCloudConfig/blob/b86ccacc/userdata/Configuration/GenericWorker/run-generic-worker-format-and-reboot-beta.bat#L9-L15
the check prevents generic worker from starting if the registry key doesn't yet exist. the key is specific to each user (because it's under HKCU). in the case of gw testers using version 8.x the user is GenericWorker. the user is created by generic-worker.exe during the gw install command. i think windows creates the key when the user first logs in.
this is also the key which we update to set windows to use "best appearance" theming.
the check proves that the registry key does not exist before the first time that generic worker is launched because gw never starts when this check is in use. presumably the launch of generic worker does something to trigger windows to create the key.
i'm all out of ideas about what to do from the occ/windows setup side of things as it seems the key cannot exist until after windows setup is complete and gw has been installed and run for the first time. the only mechanism i had available was to try to prevent gw from running if the key didn't exist but that sends me in circles since starting gw seems to be what triggers creation of the key. the preflight scripts are useful for checking the key exists and setting it's value, but in the small percent of cases where the key doesn't exist there's really nothing for it.
pete: what do you think about getting gw to terminate the instance if the reg key doesn't exist?
Flags: needinfo?(pmoore)
Pete, following up here- I would like to keep this bug moving forward- what are your thoughts on comment 66 ?
Comment 68•7 years ago
|
||
: pete could you please take a look at this bug because it is blocking 1417345 which will be recommended for disable very soon.
Also, guys could you please help with this bug and 1433851 in order to solve them because 1417345 is depending on these bugs, thanks.
Flags: needinfo?(rthijssen)
Flags: needinfo?(gbrown)
Flags: needinfo?(MattN+bmo)
| Assignee | ||
Comment 69•7 years ago
|
||
i've said my piece and done everything i can do. i believe generic-worker needs to be patched to terminate instances where the user profile has failed to load before running tasks. this can't be done from the wrapper script and can't be done from occ so until we get some help from the g-w side, i'm bowing out.
Assignee: rthijssen → relops
Flags: needinfo?(rthijssen)
| Assignee | ||
Updated•7 years ago
|
Assignee: relops → nobody
Component: RelOps → Generic-Worker
Product: Infrastructure & Operations → Taskcluster
Comment 70•7 years ago
|
||
Does the output from comment 59 appear only in failure cases, or also success cases too? I'm wondering if this might be a red herring.
Currently we run generic-worker v8.2.0 on gecko-t-win7-32-gpu as GenericWorker user. The user is created at install time, and the worker is started via the task scheduler, triggered by the user logging in via winlogon. The automatic logon is a consequence of configuring Windows to automatically log on as the GenericWorker user.
Therefore the logon has already occurred by the time generic-worker starts.
The batch script which starts generic-worker is patched in OCC to only start under certain conditions, such as making sure that DSC is not running. When it starts generic-worker, generic-worker will not trigger a new logon, it will simply attempt to claiming and execute a task.
Therefore if this registry key does not exist before Generic-Worker is started, it will not be created by generic-worker. If at some point later it exists, it is likely to have been something a task did to create it.
Please note, since this is an older version of generic-worker, and currently running in a mode where it does not create task users to sandbox tasks - it is not creating users, or performing any type of session logon - it is simply running as the current (admin) user, just like buildbot did.
From comment 9, it looks like this also happened in buildbot (as the host names are buildbot hosts) - so I don't think this is particular to the mechanics of generic-worker.
I think the path forward is to establish if there really is a 1:1 mapping between this HKCU key missing and the wrong theme being in place, or not. Then to confirm it really never exists before the worker starts up, and that it is created after the worker starts up - or whether something in the task creates or doesn't create the key.
In comment 35, I said we create new desktop sessions between tasks - this was incorrect, as I had forgotten this worker type has setting:
> "runTaskAsCurrentUser": true
This means no desktops or users will get created. In this case, generic-worker is just executing processes under the current user context as the current user.
Flags: needinfo?(pmoore)
| Assignee | ||
Comment 71•7 years ago
|
||
(In reply to Pete Moore [:pmoore][:pete] from comment #70)
> Does the output from comment 59 appear only in failure cases, or also
> success cases too? I'm wondering if this might be a red herring.
only in cases of failure.
> Therefore if this registry key does not exist before Generic-Worker is
> started, it will not be created by generic-worker. If at some point later it
> exists, it is likely to have been something a task did to create it.
the key is created by windows after the autologon script does something that triggers the user environment to be started. in the case of generic worker and the wrapper script, the key does not exist before generic worker is started (in 100% of cases). in a small percentage of cases (something like 1.5%), the key is never successfully created. this makes it impossible to set ANY hkcu registry keys, such as the one required to force aero.
> Please note, since this is an older version of generic-worker, and currently
> running in a mode where it does not create task users to sandbox tasks - it
> is not creating users, or performing any type of session logon - it is
> simply running as the current (admin) user, just like buildbot did.
no. it is running as the GenericWorker user. This user is created by generic worker during the `generic-worker install ..`
> From comment 9, it looks like this also happened in buildbot (as the host
> names are buildbot hosts) - so I don't think this is particular to the
> mechanics of generic-worker.
comment 9 relates to a different issue, which occured months before we patched preflight scripts to set HKCU registry keys
> I think the path forward is to establish if there really is a 1:1 mapping
> between this HKCU key missing and the wrong theme being in place, or not.
> Then to confirm it really never exists before the worker starts up, and that
> it is created after the worker starts up - or whether something in the task
> creates or doesn't create the key.
i have done this work and established that there really is a 1:1 mapping. please see my earlier comments in this bug
> This means no desktops or users will get created. In this case,
> generic-worker is just executing processes under the current user context as
> the current user.
yes. generic worker runs under the GenericWorker user account created by generic-worker.exe
Comment 72•7 years ago
|
||
> no. it is running as the GenericWorker user. This user is created by generic worker during the `generic-worker install ..`
The point I wanted to make is that the GenericWorker user creation happened at install time, new users are not created on the fly.
If something should be done after the user is created, this is quite possible in OCC - there is nothing to prevent you from performing steps after installing generic-worker. You don't even need to use the generic-worker install target if you don't want to - it simply creates a user, a scheduled task, and sets the automatic Windows logon to use that new user.
This is different from when we run in a sandboxed mode where a new user is created for every task.
> i have done this work and established that there really is a 1:1 mapping. please see my earlier comments in this bug
Since the generic-worker startup occurs on logon, it could be possible the logon is still in progress when it starts. Have you tried waiting for the key to exist, or did you just see if the key exists? This could be the solution, if you haven't tried that yet.
> yes. generic worker runs under the GenericWorker user account created by generic-worker.exe
As above, the generic-worker install process created the GenericWorker user. If you want to run generic-worker under an OCC created user, you are absolutely free to do this. When you run the generic worker, it is not creating users or desktops, just executing processes under the current user context. When you install the generic-worker, it does create that user and set up an autologon - but like I say - if you have a preferred mechanism to do that, you can do it differently. You are not bound to using that method. To run it, you just need the executable.
Comment 73•7 years ago
|
||
https://www.bandicam.com/free-screen-recorder/ might also be interesting to get a clearer picture of what happens in the 1.4% of failure cases. Perhaps if the recording is deleted if the problem does not occur, so is only uploaded as an artifact in the case of failure.
| Assignee | ||
Comment 74•7 years ago
|
||
(In reply to Pete Moore [:pmoore][:pete] from comment #72)
> logon is still in progress when it starts. Have you tried waiting for the
> key to exist, or did you just see if the key exists? This could be the
> solution, if you haven't tried that yet.
yes in comment 66 i linked to this change:
https://github.com/mozilla-releng/OpenCloudConfig/blob/b86ccacc/userdata/Configuration/GenericWorker/run-generic-worker-format-and-reboot-beta.bat#L9-L15
which waits indefinitely for the key to be created before launching generic worker.
the result of this patch was that generic worker never starts as the key is never created.
> As above, the generic-worker install process created the GenericWorker user.
> If you want to run generic-worker under an OCC created user, you are
> absolutely free to do this. When you run the generic worker, it is not
> creating users or desktops, just executing processes under the current user
> context. When you install the generic-worker, it does create that user and
> set up an autologon - but like I say - if you have a preferred mechanism to
> do that, you can do it differently. You are not bound to using that method.
> To run it, you just need the executable.
i don't have a preferred mechanism and i'd like to reiterate that in most cases (likely 98.5% of cases) the current mechanism works fine. what is needed is for generic worker to not attempt to run tasks in an invalid environment as in the 1.5% of cases where the HKCU registry hive is not loaded or created.
Comment 75•7 years ago
|
||
(In reply to Rob Thijssen (:grenade UTC+2) from comment #74)
> yes in comment 66 i linked to this change:
> https://github.com/mozilla-releng/OpenCloudConfig/blob/b86ccacc/userdata/
> Configuration/GenericWorker/run-generic-worker-format-and-reboot-beta.bat#L9-
> L15
> which waits indefinitely for the key to be created before launching generic
> worker.
> the result of this patch was that generic worker never starts as the key is
> never created.
I think there is a bug on line 12 with the extra colon.
Comment 76•7 years ago
|
||
(In reply to Rob Thijssen (:grenade UTC+2) from comment #74)
> i don't have a preferred mechanism and i'd like to reiterate that in most
> cases (likely 98.5% of cases) the current mechanism works fine. what is
> needed is for generic worker to not attempt to run tasks in an invalid
> environment as in the 1.5% of cases where the HKCU registry hive is not
> loaded or created.
Unfortunately implementing this in generic-worker would be no different to implementing this in OCC - it would simply wait for the key to exist, or exit after a certain timeout if it didn't. Given that the implementation would be identical, I don't believe it would get us any further. Furthermore it would muddy the concerns of generic-worker (this is a bootstrapping concern, normally handled by OCC) and would make the software more brittle and difficult to maintain and manage. The bugfix from comment 75 will establish if this key is created inside or outside the worker, which is key to deciding the solution.
| Assignee | ||
Comment 77•7 years ago
|
||
you're right about the extra colon bug. could be the whole problem. i will retest with that patched. thanks!
Comment 78•7 years ago
|
||
Sorry, I don't know how I can help here. Glad to see Rob and Pete hashing this out!
Flags: needinfo?(gbrown)
| Assignee | ||
Comment 79•7 years ago
|
||
(In reply to Pete Moore [:pmoore][:pete] from comment #76)
> Unfortunately implementing this in generic-worker would be no different to
> implementing this in OCC - it would simply wait for the key to exist, or
> exit after a certain timeout if it didn't. Given that the implementation
> would be identical, I don't believe it would get us any further. Furthermore
> it would muddy the concerns of generic-worker (this is a bootstrapping
> concern, normally handled by OCC) and would make the software more brittle
> and difficult to maintain and manage. The bugfix from comment 75 will
> establish if this key is created inside or outside the worker, which is key
> to deciding the solution.
what i was asking for (a check for the HKCU hive) actually can't be implemented in occ. occ runs as the system user. HKCU is specific to the user environment. it is only loaded when the user is logged in. the reason i was asking for the check to be implemented in generic-worker is because only generic-worker runs in the user environment of the generic-worker user.
occ has no processes (except for the gw wrapper script) that run under the same user environment as generic worker. when i didn't understand that the wrapper script had a bug (the misplaced colon), i think it was perfectly valid to expect the patch be made in generic-worker. only generic worker (and it's wrapper script) run in the right environment to make the check. occ doesn't. which is why there's nothing we can change in occ that would be effective in knowing if the runtime environment for the generic worker process is valid. i still think it's perfectly reasonable for this check to exist in generic-worker and that in fact, it really should exist in generic worker. i'm now testing a patched version of the wrapper script and if that succeeds, i'll be happy to leave it at that, but from a separation of concerns perspective i think it's perfectly valid to expect generic-worker to make a few checks on the environment it is running in before running tasks that will fail if that environment is flawed. i'd also point out that generic worker creates the user and user environment in which it runs so i don't think its too much to ask for generic worker to also validate that environment, instead of first creating it and then expecting an outside process to validate it.
| Assignee | ||
Comment 80•7 years ago
|
||
unfortunately, results with the colon removed were the same. generic worker never starts.
here's the patch:
https://github.com/mozilla-releng/OpenCloudConfig/commit/1461a1804ce4a436e6d7db0ad5395d630e565620
and here's the try push from last night where the jobs were queued but never claimed (i cancelled this morning):
https://treeherder.mozilla.org/#/jobs?repo=try&revision=c1ec249d084f85da207b979e8ea87a697339193a
and here's the papertrail logs showing that generic worker never starts:
https://papertrailapp.com/groups/7318662/events?focus=895583201341394970
so for all the reasons outlined above, i think the ball is squarely in the generic-worker court and generic-worker will have to be patched to not take tasks if the HKCU registry hive is not loaded or if some other problem in the user space that generic worker creates, prevents tasks from running successfully (leftover data from previous tasks etc. see bug 1433854 & bug 1433851).
i hope i've explained well enough in comment 79 why occ is not able to implement a fix here. believe me, i have tried and if i thought there was a way to do it without patching gw i would. i'm open to ideas that don't involve writing some massive helper script that holds gw's hand here, instead of just fixing gw so that it's capable of validating the environment it creates before it runs tasks.
Comment 81•7 years ago
|
||
(In reply to Rob Thijssen (:grenade UTC+2) from comment #80)
> and here's the papertrail logs showing that generic worker never starts:
> https://papertrailapp.com/groups/7318662/events?focus=895583201341394970
If we take one of the systems from around this time, such as:
https://papertrailapp.com/systems/1563579072/events
We never see a log line "Checking user registry hive is loaded..." which should come from https://github.com/mozilla-releng/OpenCloudConfig/blob/1461a1804ce4a436e6d7db0ad5395d630e565620/userdata/Configuration/GenericWorker/run-generic-worker-format-and-reboot-beta.bat#L10 - so it looks like that check isn't running. This suggests either some problem with this script, or that this script isn't getting executed at all. At the moment I'm not sure what might be the cause of this, as indeed your patch provides a seemingly identical environment to the non-beta version, with just a few extra lines added to run-generic-worker.bat.
I would expect to see this log line because it is written to C:\generic-worker\generic-worker.log which is logged to papertrail.
I guess the only other difference between the beta and non-beta might be the versions of the base AMI and the worker type config. However, if run-generic-worker.bat isn't even getting run, then worker type differences are unlikely to be the cause.
So now I'm a bit stumped - I will keep digging.
Comment 82•7 years ago
|
||
Note, I *don't* believe the other stray colon on line https://github.com/mozilla-releng/OpenCloudConfig/blob/1461a1804ce4a436e6d7db0ad5395d630e565620/userdata/Configuration/GenericWorker/run-generic-worker-format-and-reboot-beta.bat#L8 to be the cause - as I tested on a win7 VM locally - and both `goto <label>` and `goto :<label>` seem to work.
Comment 83•7 years ago
|
||
Note - the worker type config was slightly different (`"runTasksAsCurrentUser": true` on gecko-t-win7-32-gpu and `"runTasksAsCurrentUser": false` on gecko-t-win7-32-gpu-b) so I've updated the beta to also have `true` for this setting.
This *isn't* the cause of run-generic-worker.bat not getting run, but was worth changing so that they behave in the same way after we do get it running.
Comment 84•7 years ago
|
||
I've made a new OCC deployment here: https://tools.taskcluster.net/groups/IlPbhf6rS0uuRujH7W4NRg/tasks/EEtcFftRReC4RX0UcLMX4w/runs/0/logs/public%2Flogs%2Flive.log
This has this extra commit, just to find out if this script is executed at all, or if it starts, but doesn't reach the part where it looks for the registry key:
https://github.com/mozilla-releng/OpenCloudConfig/commit/d822a5b346721566ad25d84661c3f06d7d78dce5
Comment 85•7 years ago
|
||
Here[1] we see:
Feb 01 12:33:53 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com generic-worker: Running generic-worker startup script (run-generic-worker.bat) ...
Feb 01 12:33:53 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com Microsoft-Windows-User_Profiles_Service: <12>1 2018-02-01T11:33:08.213800+00:00 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com Microsoft-Windows-User_Profiles_Service 1076 - [NXLOG@14506 Keywords="-9223372036854775808" EventType="WARNING" EventID="1530" ProviderGuid="{89B1E9F0-5AFF-44A6-9B44-0A07A7CE5845}" Version="0" Task="0" OpcodeValue="0" RecordNumber="14838" ActivityID="{00000000-7BA8-0000-04BD-4429509BD301}" ThreadID="1096" Channel="Application" Domain="NT AUTHORITY" AccountName="SYSTEM" UserID="S-1-5-18" AccountType="User" Opcode="Info" Detail="1 user registry handles leaked from \\Registry\\User\\S-1-5-21-2192203079-3471170106-2880594189-1002:
Feb 01 12:33:53 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com Microsoft-Windows-User_Profiles_Service: Process 844 (\\Device\\HarddiskVolume1\\Windows\\System32\\winlogon.exe) has opened key \\REGISTRY\\USER\\S-1-5-21-2192203079-3471170106-2880594189-1002
Feb 01 12:33:53 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com Microsoft-Windows-User_Profiles_Service: " EventReceivedTime="2018-02-01 11:33:51" SourceModuleName="eventlog" SourceModuleType="im_msvistalog"] Windows detected your registry file is still in use by other applications or services. The file will be unloaded now. The applications or services that hold your registry file may not function properly afterwards. DETAIL - 1 user registry handles leaked from \Registry\User\S-1-5-21-2192203079-3471170106-2880594189-1002: Process 844 (\Device\HarddiskVolume1\Windows\System32\winlogon.exe) has opened key \REGISTRY\USER\S-1-5-21-2192203079-3471170106-2880594189-1002
Feb 01 12:33:55 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com Microsoft-Windows-WMI: Event filter with query "SELECT * FROM __InstanceModificationEvent WITHIN 60 WHERE TargetInstance ISA "Win32_Processor" AND TargetInstance.LoadPercentage > 99" could not be reactivated in namespace "//./root/CIMV2" because of error 0x80041003. Events cannot be delivered through this filter until the problem is corrected.
Feb 01 12:33:57 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com Microsoft-Windows-CAPI2: Failed extract of third-party root list from auto update cab at: <http://www.download.windowsupdate.com/msdownload/update/v3/static/trustedr/en/authrootstl.cab> with error: A required certificate is not within its validity period when verifying against the current system clock or the timestamp in the signed file. .
Feb 01 12:33:59 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: Enabling Windows update service
Feb 01 12:33:59 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: userdata run starting.
Feb 01 12:33:59 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: system timezone set to UTC.
Feb 01 12:34:00 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: system clock synchronised.
Feb 01 12:34:03 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: isWorker: True.
Feb 01 12:34:03 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: workerType: gecko-t-win7-32-gpu-b.
Feb 01 12:34:03 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: availabilityZone: eu-central-1b, dnsRegion: euc1.
Feb 01 12:34:03 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: Mount-DiskOne :: begin
Feb 01 12:34:03 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: Mount-DiskOne :: skipping disk mount (drives y: and z: already exist).
Feb 01 12:34:03 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: Mount-DiskOne :: end
Feb 01 12:34:03 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: Set-Pagefile :: begin
Feb 01 12:34:03 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: Set-Pagefile :: automatic managed pagefile disabled.
Feb 01 12:34:03 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: Set-Pagefile :: page file: y:\pagefile.sys, created.
Feb 01 12:34:03 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com USER32: The process C:\windows\system32\shutdown.exe (I-024A86E6BED55) has initiated the restart of computer I-024A86E6BED55 on behalf of user NT AUTHORITY\SYSTEM for the following reason: Application: Maintenance (Planned) Reason Code: 0x80040001 Shutdown Type: restart Comment: page file y:\pagefile.sys created
Feb 01 12:34:03 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: Set-Pagefile :: end
Feb 01 12:34:03 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: Map-DriveLetters :: begin
Feb 01 12:34:04 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: Map-DriveLetters :: 3 volumes detected.
Feb 01 12:34:04 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: Map-DriveLetters :: C: 30gb
Feb 01 12:34:04 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: Map-DriveLetters :: Y: 120gb
Feb 01 12:34:04 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: Map-DriveLetters :: Z: 120gb
Feb 01 12:34:04 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: Map-DriveLetters :: 3 disk partitions detected.
Feb 01 12:34:04 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: Map-DriveLetters :: Disk #0, Partition #0: 30gb
Feb 01 12:34:04 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: Map-DriveLetters :: Disk #1, Partition #0: 120gb
Feb 01 12:34:04 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: Map-DriveLetters :: Disk #2, Partition #0: 120gb
Feb 01 12:34:04 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: Map-DriveLetters :: end
Feb 01 12:34:47 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com generic-worker: Running generic-worker startup script (run-generic-worker.bat) ...
Feb 01 12:34:49 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com Microsoft-Windows-WMI: Event filter with query "SELECT * FROM __InstanceModificationEvent WITHIN 60 WHERE TargetInstance ISA "Win32_Processor" AND TargetInstance.LoadPercentage > 99" could not be reactivated in namespace "//./root/CIMV2" because of error 0x80041003. Events cannot be delivered through this filter until the problem is corrected.
Feb 01 12:34:51 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com Microsoft-Windows-CAPI2: Failed extract of third-party root list from auto update cab at: <http://www.download.windowsupdate.com/msdownload/update/v3/static/trustedr/en/authrootstl.cab> with error: A required certificate is not within its validity period when verifying against the current system clock or the timestamp in the signed file. .
Feb 01 12:34:52 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: Enabling Windows update service
Feb 01 12:34:52 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: userdata run starting.
Feb 01 12:34:53 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: system timezone set to UTC.
Feb 01 12:34:53 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: system clock synchronised.
Feb 01 12:34:56 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: isWorker: True.
Feb 01 12:34:56 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: workerType: gecko-t-win7-32-gpu-b.
Feb 01 12:34:56 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: availabilityZone: eu-central-1b, dnsRegion: euc1.
Feb 01 12:34:57 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: Mount-DiskOne :: begin
Feb 01 12:34:57 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: Mount-DiskOne :: skipping disk mount (drives y: and z: already exist).
Feb 01 12:34:57 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: Mount-DiskOne :: end
Feb 01 12:34:57 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: Set-Pagefile :: begin
Feb 01 12:34:57 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: Set-Pagefile :: skipping pagefile creation (y:\pagefile.sys exists).
Feb 01 12:34:57 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: Set-Pagefile :: end
Feb 01 12:34:57 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: Map-DriveLetters :: begin
Feb 01 12:34:57 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: Map-DriveLetters :: 3 volumes detected.
Feb 01 12:34:57 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: Map-DriveLetters :: C: 30gb
Feb 01 12:34:57 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: Map-DriveLetters :: Y: 120gb
Feb 01 12:34:57 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: Map-DriveLetters :: Z: 120gb
Feb 01 12:34:57 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: Map-DriveLetters :: 3 disk partitions detected.
Feb 01 12:34:57 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: Map-DriveLetters :: Disk #0, Partition #0: 30gb
Feb 01 12:34:57 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: Map-DriveLetters :: Disk #1, Partition #0: 120gb
Feb 01 12:34:57 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: Map-DriveLetters :: Disk #2, Partition #0: 120gb
Feb 01 12:34:57 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: Map-DriveLetters :: end
Feb 01 12:35:58 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: executed: "C:\windows\Microsoft.Net\Framework\v2.0.50727\ngen.exe executeQueuedItems".
Feb 01 12:35:58 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: executed: "C:\windows\Microsoft.Net\Framework\v4.0.30319\ngen.exe executeQueuedItems".
Feb 01 12:35:58 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: domain set to: gecko-t-win7-32-gpu-b.euc1.mozilla.com
Feb 01 12:35:58 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: instanceType: g2.2xlarge.
Feb 01 12:35:58 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: Create-ScheduledPowershellTask :: begin
Feb 01 12:36:05 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com HaltOnIdle: Is-ConditionTrue :: generic-worker is not running.
Feb 01 12:36:05 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com HaltOnIdle: Is-ConditionTrue :: OpenCloudConfig is running.
Feb 01 12:36:05 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com HaltOnIdle: instance appears to be initialising.
Feb 01 12:36:10 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: Create-ScheduledPowershellTask :: scheduled task: HaltOnIdle deleted.
Feb 01 12:36:10 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: Create-ScheduledPowershellTask :: C:\dsc\HaltOnIdle.ps1 deleted.
Feb 01 12:36:10 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: Create-ScheduledPowershellTask :: C:\dsc\HaltOnIdle.ps1 downloaded from https://raw.githubusercontent.com/mozilla-releng/OpenCloudConfig/master/userdata/HaltOnIdle.ps1?dddff4b9-606f-44cc-abd5-17286ffaea9a.
Feb 01 12:36:11 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: Create-ScheduledPowershellTask :: scheduled task: HaltOnIdle created.
Feb 01 12:36:11 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: Create-ScheduledPowershellTask :: end
Feb 01 12:36:11 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: Create-ScheduledPowershellTask :: begin
Feb 01 12:36:12 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: Create-ScheduledPowershellTask :: scheduled task: PrepLoaner deleted.
Feb 01 12:36:12 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: Create-ScheduledPowershellTask :: C:\dsc\PrepLoaner.ps1 deleted.
Feb 01 12:36:12 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: Create-ScheduledPowershellTask :: C:\dsc\PrepLoaner.ps1 downloaded from https://raw.githubusercontent.com/mozilla-releng/OpenCloudConfig/master/userdata/PrepLoaner.ps1?a249e7a1-77dc-420a-91c6-e34b81a42809.
Feb 01 12:36:13 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: Create-ScheduledPowershellTask :: scheduled task: PrepLoaner created.
Feb 01 12:36:13 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: Create-ScheduledPowershellTask :: end
Feb 01 12:36:13 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: Stop-DesiredStateConfig :: begin
Feb 01 12:36:13 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: Stop-DesiredStateConfig :: end
Feb 01 12:36:13 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: Remove-DesiredStateConfigTriggers :: begin
Feb 01 12:36:14 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: scheduled task: RunDesiredStateConfigurationAtStartup, deleted.
Feb 01 12:36:14 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: C:\windows\System32\Configuration\Current.mof deleted
Feb 01 12:36:14 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: C:\dsc\rundsc.ps1 deleted
Feb 01 12:36:14 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: Remove-DesiredStateConfigTriggers :: end
Feb 01 12:36:14 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: New-LocalCache :: begin
Feb 01 12:36:14 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: New-LocalCache :: end
Feb 01 12:36:15 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: log archive C:\log\20180201113452.userdata-run.zip created.
Feb 01 12:36:15 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: generic-worker installation detected.
Feb 01 12:36:15 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: waiting for generic-worker process to start.
Feb 01 12:36:50 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com Microsoft-Windows-DSC: The DscTimer is going to invoke the consistency check for task of type REBOOT.
Feb 01 12:36:50 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com Microsoft-Windows-DSC: The DscTimer is running LCM method PerformRequiredConfigurationChecks with the flag set to 2.
Feb 01 12:36:50 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com Microsoft-Windows-DSC: The local configuration manager started.
Feb 01 12:36:50 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com Microsoft-Windows-DSC: Job {2EF8DAE2-0744-11E8-B044-068A4138A600} : Operation Consistency Check or Pull started by user sid S-1-5-20 from computer NULL.
Feb 01 12:36:50 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com Microsoft-Windows-DSC: The DscTimer is going to start the consistency timer for the first time with value 15 minutes.
Feb 01 12:36:50 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com Microsoft-Windows-DSC: The DscTimer successfully created the consistency timer.
Feb 01 12:36:50 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com Microsoft-Windows-DSC: The DscTimer is going to start the refresh timer for the first time with value 30 minutes.
Feb 01 12:36:50 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com Microsoft-Windows-DSC: The DscTimer successfully created the refresh timer.
Feb 01 12:36:50 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com Microsoft-Windows-DSC: Job {2EF8DAE2-0744-11E8-B044-068A4138A600} : Running consistency engine.
Feb 01 12:36:50 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com Microsoft-Windows-DSC: Job {2EF8DAE2-0744-11E8-B044-068A4138A600} : Consistency Engine did not find a current or pending configuration to apply
Feb 01 12:36:50 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com Microsoft-Windows-DSC: Job {2EF8DAE2-0744-11E8-B044-068A4138A600} : Consistency engine was run successfully.
Feb 01 12:36:50 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com Microsoft-Windows-DSC: Job {2EF8DAE2-0744-11E8-B044-068A4138A600} : Job runs under the following LCM setting. ConfigurationMode: ApplyAndMonitor ConfigurationModeFrequencyMins: 15 RefreshMode: PUSH RefreshFrequencyMins: 30 RebootNodeIfNeeded: NONE DebugMode: False
Feb 01 12:36:50 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com Microsoft-Windows-DSC: Job {2EF8DAE2-0744-11E8-B044-068A4138A600} : Operation Consistency Check or Pull completed successfully.
Feb 01 12:36:50 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com Microsoft-Windows-DSC: The DscTimer has successfully run LCM method PerformRequiredConfigurationChecks with flag 2.
Feb 01 12:37:50 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com Microsoft-Windows-DSC: The local configuration manager was shut down.
Feb 01 12:38:04 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com HaltOnIdle: Is-ConditionTrue :: generic-worker is not running.
Feb 01 12:38:04 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com HaltOnIdle: Is-ConditionTrue :: OpenCloudConfig is running.
Feb 01 12:38:04 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com HaltOnIdle: instance appears to be initialising.
Feb 01 12:38:15 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: no generic-worker process detected.
Feb 01 12:38:18 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: Z: drive formatted.
Feb 01 12:38:18 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com OpenCloudConfig: userdata run completed
Feb 01 12:38:18 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com USER32: The process C:\windows\system32\shutdown.exe (I-024A86E6BED55) has initiated the restart of computer I-024A86E6BED55 on behalf of user NT AUTHORITY\SYSTEM for the following reason: Application: Maintenance (Planned) Reason Code: 0x80040001 Shutdown Type: restart Comment: reboot to rouse the generic worker
Feb 01 12:39:03 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com generic-worker: Running generic-worker startup script (run-generic-worker.bat) ...
Feb 01 12:39:06 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com Microsoft-Windows-WMI: Event filter with query "SELECT * FROM __InstanceModificationEvent WITHIN 60 WHERE TargetInstance ISA "Win32_Processor" AND TargetInstance.LoadPercentage > 99" could not be reactivated in namespace "//./root/CIMV2" because of error 0x80041003. Events cannot be delivered through this filter until the problem is corrected.
Feb 01 12:39:07 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com Microsoft-Windows-CAPI2: Failed extract of third-party root list from auto update cab at: <http://www.download.windowsupdate.com/msdownload/update/v3/static/trustedr/en/authrootstl.cab> with error: A required certificate is not within its validity period when verifying against the current system clock or the timestamp in the signed file. .
Feb 01 12:40:09 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com HaltOnIdle: Is-ConditionTrue :: generic-worker is not running.
Feb 01 12:40:10 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com HaltOnIdle: Is-ConditionTrue :: OpenCloudConfig is not running.
Feb 01 12:40:10 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com HaltOnIdle: Get-Uptime :: last boot: 2/1/2018 11:38:45 AM; uptime: 00:01:22.4273000.
Feb 01 12:40:10 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com HaltOnIdle: instance failed some validity checks and will be retested shortly.
Feb 01 12:41:05 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com Microsoft-Windows-DSC: The DscTimer is going to invoke the consistency check for task of type REBOOT.
Feb 01 12:41:05 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com Microsoft-Windows-DSC: The DscTimer is running LCM method PerformRequiredConfigurationChecks with the flag set to 2.
Feb 01 12:41:05 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com Microsoft-Windows-DSC: The local configuration manager started.
Feb 01 12:41:05 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com Microsoft-Windows-DSC: Job {C7808B89-0744-11E8-A0EB-068A4138A600} : Operation Consistency Check or Pull started by user sid S-1-5-20 from computer NULL.
Feb 01 12:41:05 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com Microsoft-Windows-DSC: The DscTimer is going to start the consistency timer for the first time with value 15 minutes.
Feb 01 12:41:05 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com Microsoft-Windows-DSC: The DscTimer successfully created the consistency timer.
Feb 01 12:41:05 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com Microsoft-Windows-DSC: The DscTimer is going to start the refresh timer for the first time with value 30 minutes.
Feb 01 12:41:05 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com Microsoft-Windows-DSC: The DscTimer successfully created the refresh timer.
Feb 01 12:41:05 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com Microsoft-Windows-DSC: Job {C7808B89-0744-11E8-A0EB-068A4138A600} : Running consistency engine.
Feb 01 12:41:05 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com Microsoft-Windows-DSC: Job {C7808B89-0744-11E8-A0EB-068A4138A600} : Consistency Engine did not find a current or pending configuration to apply
Feb 01 12:41:05 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com Microsoft-Windows-DSC: Job {C7808B89-0744-11E8-A0EB-068A4138A600} : Consistency engine was run successfully.
Feb 01 12:41:05 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com Microsoft-Windows-DSC: Job {C7808B89-0744-11E8-A0EB-068A4138A600} : Job runs under the following LCM setting. ConfigurationMode: ApplyAndMonitor ConfigurationModeFrequencyMins: 15 RefreshMode: PUSH RefreshFrequencyMins: 30 RebootNodeIfNeeded: NONE DebugMode: False
Feb 01 12:41:05 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com Microsoft-Windows-DSC: Job {C7808B89-0744-11E8-A0EB-068A4138A600} : Operation Consistency Check or Pull completed successfully.
Feb 01 12:41:05 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com Microsoft-Windows-DSC: The DscTimer has successfully run LCM method PerformRequiredConfigurationChecks with flag 2.
Feb 01 12:42:05 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com HaltOnIdle: Is-ConditionTrue :: generic-worker is not running.
Feb 01 12:42:05 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com HaltOnIdle: Is-ConditionTrue :: OpenCloudConfig is not running.
Feb 01 12:42:05 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com HaltOnIdle: Get-Uptime :: last boot: 2/1/2018 11:38:45 AM; uptime: 00:03:17.8361000.
Feb 01 12:42:05 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com HaltOnIdle: instance failed some validity checks and will be retested shortly.
Feb 01 12:42:06 i-024a86e6bed557b0e.gecko-t-win7-32-gpu-b.euc1.mozilla.com Microsoft-Windows-DSC: The local configuration manager was shut down.
So we see that we reach line 3 of https://github.com/mozilla-releng/OpenCloudConfig/blob/d822a5b346721566ad25d84661c3f06d7d78dce5/userdata/Configuration/GenericWorker/run-generic-worker-format-and-reboot-beta.bat#L3 but we never reach line 12.
--
[1] https://papertrailapp.com/systems/1564259182/events?focus=895683824485322760&selected=895683824485322760
Comment 86•7 years ago
|
||
In other words, I believe there is a bug somewhere in these lines, which is the cause of the problem:
https://github.com/mozilla-releng/OpenCloudConfig/blob/d822a5b346721566ad25d84661c3f06d7d78dce5/userdata/Configuration/GenericWorker/run-generic-worker-format-and-reboot-beta.bat#L5-L11
Comment 87•7 years ago
|
||
New OCC deploy to find out which line has the problem: https://github.com/mozilla-releng/OpenCloudConfig/commit/ab552b54b021ef05ec46c4c78979676cab878fa0
Deploying in: https://github.com/mozilla-releng/OpenCloudConfig/commit/ab552b54b021ef05ec46c4c78979676cab878fa0
Comment 88•7 years ago
|
||
Comment 89•7 years ago
|
||
(In reply to Pete Moore [:pmoore][:pete] from comment #87)
> Deploying in: https://github.com/mozilla-releng/OpenCloudConfig/commit/ab552b54b021ef05ec46c4c78979676cab878fa0
Correction: https://tools.taskcluster.net/groups/ZkeW9AawTi-5stGjrR1Egg/tasks/IH93v0NmRtCyDlAl1N6Zmw/runs/0/logs/public%2Flogs%2Flive.log
Comment 90•7 years ago
|
||
Looks like we get to Position E in the following log[1]:
Feb 01 14:57:50 i-06ae9ef464c9b8e73.gecko-t-win7-32-gpu-b.usw2.mozilla.com generic-worker: Running generic-worker startup script (run-generic-worker.bat) ...
Feb 01 14:57:50 i-06ae9ef464c9b8e73.gecko-t-win7-32-gpu-b.usw2.mozilla.com generic-worker: Position A
Feb 01 14:57:50 i-06ae9ef464c9b8e73.gecko-t-win7-32-gpu-b.usw2.mozilla.com generic-worker: Position B
Feb 01 14:57:50 i-06ae9ef464c9b8e73.gecko-t-win7-32-gpu-b.usw2.mozilla.com generic-worker: Position C
Feb 01 14:57:50 i-06ae9ef464c9b8e73.gecko-t-win7-32-gpu-b.usw2.mozilla.com generic-worker: Position D
Feb 01 14:57:50 i-06ae9ef464c9b8e73.gecko-t-win7-32-gpu-b.usw2.mozilla.com generic-worker: Position E
Feb 01 14:57:50 i-06ae9ef464c9b8e73.gecko-t-win7-32-gpu-b.usw2.mozilla.com generic-worker: Running generic-worker startup script (run-generic-worker.bat) ...
Feb 01 14:57:50 i-06ae9ef464c9b8e73.gecko-t-win7-32-gpu-b.usw2.mozilla.com generic-worker: Position A
Feb 01 14:57:50 i-06ae9ef464c9b8e73.gecko-t-win7-32-gpu-b.usw2.mozilla.com generic-worker: Position B
Feb 01 14:57:50 i-06ae9ef464c9b8e73.gecko-t-win7-32-gpu-b.usw2.mozilla.com generic-worker: Position C
Feb 01 14:57:50 i-06ae9ef464c9b8e73.gecko-t-win7-32-gpu-b.usw2.mozilla.com generic-worker: Position D
Feb 01 14:57:50 i-06ae9ef464c9b8e73.gecko-t-win7-32-gpu-b.usw2.mozilla.com generic-worker: Position E
We also see that the script is started twice.
So the line we seem to get stuck at[2] is:
> if "%USERNAME%" != "GenericWorker" goto CheckForStateFlag
This suggests either:
1) a syntax error with this if statement
2) %USERNAME% isn't GenericWorker and the `goto CheckForStateFlag` is executed. This would then further imply that either
a) Z:\loan exists since if it didn't, we would expect to see "Checking for C:\dsc\task-claim-state.valid file..." in the logs
b) the check for Z:\loan (lines 27,28) has invalid syntax
c) lines 27,28 causes the script to exit for another reason (perhaps if Z: drive doesn't exist)
I will investigate!
--
[1] https://papertrailapp.com/systems/1564340502/events?focus=895719823827456068&selected=895719823827456068
[2] https://github.com/mozilla-releng/OpenCloudConfig/blob/ab552b54b021ef05ec46c4c78979676cab878fa0/userdata/Configuration/GenericWorker/run-generic-worker-format-and-reboot-beta.bat#L14
Comment 91•7 years ago
|
||
(In reply to Pete Moore [:pmoore][:pete] from comment #90)
> This suggests either:
>
> 1) a syntax error with this if statement
> 2) %USERNAME% isn't GenericWorker and the `goto CheckForStateFlag` is
> executed. This would then further imply that either
> a) Z:\loan exists since if it didn't, we would expect to see "Checking
> for C:\dsc\task-claim-state.valid file..." in the logs
> b) the check for Z:\loan (lines 27,28) has invalid syntax
> c) lines 27,28 causes the script to exit for another reason (perhaps if
> Z: drive doesn't exist)
>
> I will investigate!
And the answer was ........ 1).
!= is not valid in an IF statement: https://ss64.com/nt/if.html
== is allowed.
Comment 92•7 years ago
|
||
Comment 93•7 years ago
|
||
(In reply to Pete Moore [:pmoore][:pete] from comment #88)
> I've triggered
> https://tools.taskcluster.net/groups/LyVQlNq9SYSykwLkok0z0w/tasks/
> LyVQlNq9SYSykwLkok0z0w/details
This task is now running. That seems to have done the trick.
I'll make a patch to roll out to production.
Comment 94•7 years ago
|
||
This is the version that was working on gecko-t-win7-32-gpu-b.
Credit to :grenade - this is really his work, with just a tweak from me.
Thanks!
Attachment #8947481 -
Flags: review?(rthijssen)
Comment on attachment 8947481 [details] [review]
Github Pull Request for OpenCloudConfig
:mark, can you look at this as :grenade is on holiday today.
Attachment #8947481 -
Flags: review?(mcornmesser)
Comment 96•7 years ago
|
||
r+ from grenade in IRC:
https://mozilla.logbot.info/releng/20180201#c14228185
Updated•7 years ago
|
Attachment #8947481 -
Flags: review?(rthijssen)
Attachment #8947481 -
Flags: review?(mcornmesser)
Attachment #8947481 -
Flags: review+
| Assignee | ||
Comment 97•7 years ago
|
||
Comment on attachment 8947481 [details] [review]
Github Pull Request for OpenCloudConfig
lgtm
Comment 98•7 years ago
|
||
Thanks Rob!
Deploying in https://tools.taskcluster.net/groups/TsEtVPzrTzmuaDZZKN2uDQ/tasks/UIgCjIgIQ4GsIZAhZZa39A/details
Joel,
Note - you'll need to wait at least *30 mins after* the above task completes before submitting try pushes to ensure they run on the new Win7 GPU images.
Comment 99•7 years ago
|
||
Summary (for those that don't wish to read the entire bug)
==========================================================
The generic-worker on this environment is started as a scheduled task for the GenericWorker user, which is configured to execute when the user logs in.
The scheduled task starts:
C:\generic-worker\run-generic-worker.bat
which in turn waits for certain conditions (like that machine configuration has completed, drives are all mounted, etc) before starting the executable C:\generic-worker\generic-worker.exe.
It seems that the scheduled task that runs C:\generic-worker\run-generic-worker.bat was calling it before the registry setting:
> HKEY_CURRENT_USER\Software\Microsoft\Windows\CurrentVersion\Explorer\VisualEffects
exists. This may be because the user registry hive has not been mounted at all, or could be that the desktop initialisation is still in progress.
Either way, by waiting in run-generic-worker.bat for this registry key to exist, before starting generic-worker.exe, we ensure that it is available for any task that requires it to be there.
So in the end it appeared to be a race condition - the generic worker was starting too quickly before the explorer initialisation had completed, and in these cases the correct theme was not (yet) loaded.
Comment 100•7 years ago
|
||
Seems to have deployed ok - papertrail logs look good, and tasks are getting executed.
Comment 101•7 years ago
|
||
e.g. https://tools.taskcluster.net/groups/PBXU6CQNQUSCUNXJn6ihbg/tasks/YxDPlHDIRayfZdl7o4NBIg/runs/0 is running in https://papertrailapp.com/systems/1565012922/events
So gecko-t-win7-32-gpu workers now have the fix, and normal try pushes should be using fixed workers.
| Reporter | ||
Comment 102•7 years ago
|
||
Thank you Pete and Rob for getting this figured out!
Flags: needinfo?(MattN+bmo)
umm, I think this is fixed:
https://treeherder.mozilla.org/#/jobs?repo=try&revision=dc31322663600ed37ba767e91793fb12848eae0c
the 1 failure has windowsDefaultTheme:true- typically I see 20+ failures for default theme with that patch. I say this is fixed now.
:pmoore, is there anymore work todo in order to roll this out or clean up loose ends?
Flags: needinfo?(pmoore)
Comment 104•7 years ago
|
||
Strange about that 1 test failure - but I guess we can close this. I couldn't see any image attachments to the failed task, but I guess if it had `windowsDefaultTheme:true` then it indicates it might be the same problem, but occurring far less frequently. I'll let you decide whether we should do something about it. :-)
Happy to put this one to bed. :-)
/me turns the lights out
Status: REOPENED → RESOLVED
Closed: 7 years ago → 7 years ago
Flags: needinfo?(pmoore)
Resolution: --- → FIXED
one other thought I had is do we have this rolled out to non gpu instances as well?
Flags: needinfo?(pmoore)
Comment 107•7 years ago
|
||
It is currently rolled out to the following worker types:
gecko-1-b-win2012
gecko-1-b-win2012-beta
gecko-2-b-win2012
gecko-3-b-win2012
gecko-t-win10-64
gecko-t-win10-64-beta
gecko-t-win10-64-cu
gecko-t-win10-64-gpu
gecko-t-win10-64-gpu-b
gecko-t-win7-32
gecko-t-win7-32-beta
gecko-t-win7-32-cu
gecko-t-win7-32-gpu
Flags: needinfo?(pmoore)
Comment 108•7 years ago
|
||
Note though, it has no effect on the win2012 machines, since they generate users on-the-fly for tasks.
Comment 109•7 years ago
|
||
Hey guys, I wonder if this fix will help with any of the notification tests we needed to disable. Is it worth us checking if they work better now?
Flags: needinfo?(jmaher)
Flags: needinfo?(MattN+bmo)
Comment 110•7 years ago
|
||
(In reply to Pete Moore [:pmoore][:pete] from comment #104)
> Strange about that 1 test failure - but I guess we can close this. I
> couldn't see any image attachments to the failed task, but I guess if it had
> `windowsDefaultTheme:true` then it indicates it might be the same problem,
> but occurring far less frequently. I'll let you decide whether we should do
> something about it. :-)
Just FTR, that failure wasn't theme-related; it was an example of the other type of common Win7 intermittent reftest failures we still have, bug 1430572. (Which is the follow-up to bug 1396260, where the same underlying issue was mitigated but not really fixed by hyper-chunking the Win7 reftests.)
:pmoore- there are some notification tests that have been failing on your tests for the new worker- I am not sure which specific notification test you are referencing; I don't think this work will help the handful of test failures you see with the new worker.
Flags: needinfo?(jmaher)
Updated•7 years ago
|
Blocks: tc-stability
| Reporter | ||
Comment 112•7 years ago
|
||
Dolske is finding someone else to look into fixing the notification tests so we'll find out soon.
Flags: needinfo?(MattN+bmo)
Updated•7 years ago
|
Assignee: nobody → rthijssen
Updated•6 years ago
|
Component: Generic-Worker → Workers
You need to log in
before you can comment on or make changes to this bug.
Description
•