Closed
Bug 550815
Opened 14 years ago
Closed 14 years ago
Buildbot doesn't start reliably on recent win32 slaves
Categories
(Release Engineering :: General, defect)
Tracking
(Not tracked)
RESOLVED
FIXED
People
(Reporter: nthomas, Assigned: rail)
References
()
Details
(Whiteboard: [buildslaves][opsi])
Attachments
(8 files, 3 obsolete files)
7.74 KB,
patch
|
catlee
:
review+
bhearsum
:
checked-in+
|
Details | Diff | Splinter Review |
66.20 KB,
image/png
|
Details | |
10.44 KB,
patch
|
catlee
:
review+
bhearsum
:
checked-in+
|
Details | Diff | Splinter Review |
1.38 KB,
patch
|
catlee
:
review+
bhearsum
:
checked-in+
|
Details | Diff | Splinter Review |
1.08 KB,
patch
|
bhearsum
:
review+
bhearsum
:
checked-in+
|
Details | Diff | Splinter Review |
462 bytes,
patch
|
bhearsum
:
review+
rail
:
checked-in+
|
Details | Diff | Splinter Review |
933 bytes,
patch
|
bhearsum
:
review+
bhearsum
:
checked-in+
|
Details | Diff | Splinter Review |
393 bytes,
text/plain
|
Details |
Two related sets of symptoms * there are several mw32-ix-slaveNN slaves that don't always start buildbot after rebooting, and are just sitting at the desktop with nothing running. I'm using the timestamps in twistd.log and 'net statistics server | head' (as uptime proxy) to determine that. Reopened bug 547799 to get a nagios check for buildbot * moz2-win32-slave50ish - 60 have intermittent problems determining the installed compilers when the MozillaBuild terminal starts, or Hopefully we can figure out the changes between VM slave50ish and earlier that might have caused this.
Reporter | ||
Comment 2•14 years ago
|
||
Checked just now and only 5 of 24 win32 ix slaves were still connected to their production masters, over the space of a day or two.
Comment 3•14 years ago
|
||
I'll do my best to resolve this after I finish up with the linux ix machine issues in bug 549672
Assignee: nobody → bhearsum
Comment 4•14 years ago
|
||
Looks like OPSI is at fault here. It appears we're running a different version of the "preloginloader" package on all the failing machines, as well as the ref platforms. After uninstalling OPSI on mw32-ix-slave01 buildbot not no trouble starting through a few reboots.
Comment 5•14 years ago
|
||
And for posterity, preloginloader version 3.4-27 is the problem version.
Reporter | ||
Comment 6•14 years ago
|
||
We have 3.4-27 on win32-slave50 and upwards, so that correlates with the VMs that also have trouble starting buildbot.
Comment 7•14 years ago
|
||
It appears that the working machines are running preloginloader 3.3-22, based on my diffing. This would make sense, because we ran OPSI 3.3 prior to deploying on Vista, when we had to upgrade to 3.4. The Vista machines *require* preloginloader 3.4-27 to work. Assuming I'm right about the version, this gets tricky. There's no way I know of to have multiple versions of the a package installed on the server, so trying to manage having different versions for different machines is very difficult. The best option would be to find a preloginloader that works for all of our machines. There's been quite a few released since last time we tried to upgrade, so the most recent one (3.4-39) may work for us. Last time we tried to upgrade it was a disaster, and caused Vista hangs.
Blocks: 545136
Comment 8•14 years ago
|
||
Alice helpfully reminds me that we've mostly phased at Vista at this point, so Vista support is not required in whatever preloginloader package we use. I've tested 3.4-39 on XP and win2k3. Works fine on XP, crashes at startup on Win2k3. This version is useless for us. I'll look for other versions to try next week.
Comment 9•14 years ago
|
||
Possibly related or at fault here is that the automatic login happens while OPSI is checking for and installing packages, rather than afterwards. This didn't used to be the case, so it's possibly a bug that only exists in 3.4-27.
Reporter | ||
Comment 10•14 years ago
|
||
On busy days for the trees we lose the ix machines at quite a clip. Any luck finding other versions to test ?
Comment 11•14 years ago
|
||
Sorry, I got sidetracked with other things, again :(. I didn't find any other versions to try other than even *more* experimental, which don't seem worth the effort. Catlee suggested starting Buildbot in a loop at startup, rather than just once. I gave this a try by modifying d:\mozilla-build\start-buildbot.bat to do so, and that seemed to continue to work after many reboots. I still need to tweak it a bit to support both the 'slave' and 'moz2_slave' buildbot dirs, and add it to the things deployed in the buildbot-batch-file OPSI package. I *hope* I can get to this tomorrow or Thursday.
Comment 12•14 years ago
|
||
(In reply to comment #11) > Sorry, I got sidetracked with other things, again :(. > > I didn't find any other versions to try other than even *more* experimental, > which don't seem worth the effort. > > Catlee suggested starting Buildbot in a loop at startup, rather than just once. > I gave this a try by modifying d:\mozilla-build\start-buildbot.bat to do so, > and that seemed to continue to work after many reboots. I still need to tweak > it a bit to support both the 'slave' and 'moz2_slave' buildbot dirs, and add it > to the things deployed in the buildbot-batch-file OPSI package. I did some more testing and my script continued to work well. However, I could not reproduce the original problem in staging, even with a production slave that had just failed. Furthermore, there seems to be two ways to fail here: 1) Removing the sleep from the batch file in the start menu causes Buildbot to fail to start 100% of the time, even in staging. 2) Sometimes in production, slaves will fail to start buildbot correctly despite the sleep. I believe the cause of this is different than 1) and based on not being able to reproduce in staging I'm starting to think it's related to load on the master. The scenario could be: slave tries to start -> master doesn't respond for awhile -> slave dies. The Buildbot process isn't supposed to die in that case, but something weird could be going on. I'm going to prepare an OPSI package that rolls out my updated start-up scripts and I'd like to roll it out in production and see if it fixes the issue. If the scenario is anything like what I described, it should.
Comment 13•14 years ago
|
||
So, this package replaces the buildbot-batch-file package and drops in the three files we use in starting Buildbot: buildbot.bat (start menu, deals with tac generation as well) start-buildbot.bat (copy of start-msvs8.bat from mozillabuild, modified to launch buildbot) start-buildbot.sh (called from start-buildbot.bat, launches buildbot slave in a loop for some peried of time) I've tested this on mw32-ix-slave01 and both the installation and launching of Buildbot has worked fine.
Attachment #437355 -
Flags: review?(catlee)
Comment 14•14 years ago
|
||
Catlee, I addressed the comments about the sleeping and simplified the elapsed time measurement. I also tested a few scenarios where jobs ended quickly. Thankfully, the machine rebooted cleanly, and without Buildbot reconnecting to the master in the middle of the shutdown. So, AFAICT, doing it this way is safe.
Attachment #437355 -
Attachment is obsolete: true
Attachment #437617 -
Flags: review?(catlee)
Attachment #437355 -
Flags: review?(catlee)
Updated•14 years ago
|
Attachment #437617 -
Flags: review?(catlee) → review+
Comment 15•14 years ago
|
||
Comment on attachment 437617 [details] [diff] [review] buildbot startup, rev2 changeset: 43:f394588cd62c
Attachment #437617 -
Flags: checked-in+
Comment 16•14 years ago
|
||
I set this to roll out on all of the build slaves. I tested one by hand, and it's working fine. Leaving open until at least tomorrow, when we can assess whether or not it worked.
Comment 17•14 years ago
|
||
(In reply to comment #0) > Two related sets of symptoms > * there are several mw32-ix-slaveNN slaves that don't always start buildbot > after rebooting, and are just sitting at the desktop with nothing running. I'm > using the timestamps in twistd.log and 'net statistics server | head' (as > uptime proxy) to determine that. Reopened bug 547799 to get a nagios check for > buildbot Overnight, we haven't seen any more of these. > * moz2-win32-slave50ish - 60 have intermittent problems determining the > installed compilers when the MozillaBuild terminal starts, or I found two machines hitting these issues this morning. Neither had the new OPSI package installed, but considering they hang before start-buildbot.bat completes, I doubt it will help.
Comment 18•14 years ago
|
||
Comment 19•14 years ago
|
||
mw32-ix-slave18 didn't start buildbot properly today. Nagios reported it down 2 minutes after the machine booted -- so it never even got into the .sh file :-(. It could be that the IX machines are failing with the same, strange SDK error the VMs have hit, but aren't blocking on the dialog. Back to square one here :(.
Comment 20•14 years ago
|
||
(In reply to comment #0) > Two related sets of symptoms > * there are several mw32-ix-slaveNN slaves that don't always start buildbot > after rebooting, and are just sitting at the desktop with nothing running. I'm > using the timestamps in twistd.log and 'net statistics server | head' (as > uptime proxy) to determine that. Reopened bug 547799 to get a nagios check for > buildbot Based on the fact that there have been very few machines disconnected since the new startup scripts landed, I think that this part is fixed. > * moz2-win32-slave50ish - 60 have intermittent problems determining the > installed compilers when the MozillaBuild terminal starts, or But as mentioned in my previous comment, there is at least one ix machine which has the same issues as these VMs -- which don't appear to be solved yet.
Comment 21•14 years ago
|
||
A number of ix boxes seem to be hitting this. I've been restarting them by hand, and then getting new nagios errors later -- I'm going to just start acking with this bug.
Comment 22•14 years ago
|
||
(In reply to comment #21) > A number of ix boxes seem to be hitting this. > I've been restarting them by hand, and then getting new nagios errors later -- > I'm going to just start acking with this bug. Yes...this bug has been tracking the issues with the mw32-ix machines and slaves 50-59 for weeks now. When you see machines in this state, please start buildbot by launching the batch file in the start menu until we have a fix for this.
Comment 23•14 years ago
|
||
Grasping at straws, I'm doing some large diffs to see if there's any interesting, undocumented changes between an older slave (win32-slave26) and a newer one (win32-slave52). So far, I've diff'ed the entire mozilla-build install directory, and come up with nothing interesting. Full list of differences: * Lots of pyc files (expected, harmless) * Parts of the tar installation were not updated on the latest machines (binaries were, support scripts and docs were not) * .bash_history for Administrator a cltbld Other than that listed above, the two mozilla-build installations were absolutely identical. I've also diffed dumps of Visual Studio portion of the registry between them. They were 100% identical. Next up is a diff of the entire msvs8 installation.
Comment 24•14 years ago
|
||
There appears to be no meaningful difference in the MSVS8 installations, either. Complete list: * msvs8/Common7/IDE/ItemTemplatesCache/cache.bin * msvs8/Common7/IDE/ProjectTemplatesCache/cache.bin * msvs8/SmartDevices/SDK/SDKTools/cabwiz.exe.inf * msvs8/VC/vcpackages/WCE.VCPlatform.config
Comment 26•14 years ago
|
||
(In reply to comment #20) > (In reply to comment #0) > > Two related sets of symptoms > > * there are several mw32-ix-slaveNN slaves that don't always start buildbot > > after rebooting, and are just sitting at the desktop with nothing running. I'm > > using the timestamps in twistd.log and 'net statistics server | head' (as > > uptime proxy) to determine that. Reopened bug 547799 to get a nagios check for > > buildbot > > Based on the fact that there have been very few machines disconnected since the > new startup scripts landed, I think that this part is fixed. > I just saw an IX machine that definitely hit this exact issue. It's looking more and more like my startup script changes haven't fixed anything.
Comment 27•14 years ago
|
||
This morning I found 21/24 production ix machines disconnected, and 2/10 afflicted VMs.
Comment 28•14 years ago
|
||
Adds better logging to all of the files we use in the Buildbot startup, including guess-msvc.bat, which is managed with this package starting with this patch. Hopefully this will clue us in to exactly where we're hitting issues.
Attachment #438758 -
Flags: review?(catlee)
Updated•14 years ago
|
Attachment #438758 -
Flags: review?(catlee) → review+
Updated•14 years ago
|
Attachment #438758 -
Flags: checked-in+
Comment 29•14 years ago
|
||
Comment on attachment 438758 [details] [diff] [review] add better startup logging to windows boot This is set to roll out on all of the build slaves.
Comment 30•14 years ago
|
||
Lots of slaves failed overnight again, but this time they did some logging! Here's a successful start: Tue 04/13/2010 22:31:20.79 - Very start of buildbot.bat" "Tue 04/13/2010 22:31:21.62 - Sleeping at the end of buildbot.bat" "Tue 04/13/2010 22:31:38.32 - About to run start-buildbot.bat" "Tue 04/13/2010 22:31:38.43 - About to call guess-msvc.bat" "Tue 04/13/2010 22:31:38.45 - Start of guess-msvc.bat" "Tue 04/13/2010 22:31:38.46 - About to query MSVC8KEY" "Tue 04/13/2010 22:31:38.48 - Queried, VC8DIR is " "Tue 04/13/2010 22:31:38.57 - End of guess-msvc.bat" "Tue 04/13/2010 22:31:38.57 - Calling vcvars32.bat in VC8DIR" "Tue 04/13/2010 22:31:38.64 - About to run start-buildbot.sh" "Tue 04/13/2010 22:31:38.65 - start-buildbot.sh finished" Tue Apr 13 22:31:41 PDT 2010 - start of start-buildbot.sh 1: Running /d/mozilla-build/python25/scripts/buildbot start /e/builds/moz2_slave 1: Ran /d/mozilla-build/python25/scripts/buildbot start /e/builds/moz2_slave It's a mid muddled because we're logging to the same file from both cmd and MSYS. The "start-buildbot.sh" reference is supposed to be at the end. Here's a failed one: Wed 04/14/2010 2:00:06.25 - Very start of buildbot.bat" "Wed 04/14/2010 2:00:07.01 - Sleeping at the end of buildbot.bat" "Wed 04/14/2010 2:00:38.42 - About to run start-buildbot.bat" "Wed 04/14/2010 2:00:38.53 - About to call guess-msvc.bat" "Wed 04/14/2010 2:00:38.54 - Start of guess-msvc.bat" "Wed 04/14/2010 2:00:38.57 - About to query MSVC8KEY" "Wed 04/14/2010 2:00:38.57 - Queried, VC8DIR is " "Wed 04/14/2010 2:00:38.67 - End of guess-msvc.bat" "Wed 04/14/2010 2:00:38.67 - Calling vcvars32.bat in VC8DIR" "Wed 04/14/2010 2:00:38.73 - About to run start-buildbot.sh" "Wed 04/14/2010 2:00:38.75 - start-buildbot.sh finished" So, it appears as if the 'start' command at the end of start-buildbot.bat (http://hg.mozilla.org/build/opsi-package-sources/file/78a67cfb7bff/buildbot-startup/CLIENT_DATA/start-buildbot.bat#l66) fails, because we never see the prints from start-buildbot.sh.
Comment 31•14 years ago
|
||
One quick theory is that rxvt is crashing. It feels familiar, but I can't dig anything like that up at the moment.
Assignee | ||
Comment 32•14 years ago
|
||
(In reply to comment #31) > One quick theory is that rxvt is crashing. It feels familiar, but I can't dig > anything like that up at the moment. Can we rum cmd instead of rxvt? Is there any requirements for rxvt? The following command worked fine on one of the dead slaves: start /min "Buildbot" "d:\mozilla-build\msys\bin\bash" --login -c /d/mozilla-build/start-buildbot.sh
Comment 33•14 years ago
|
||
(In reply to comment #32) > (In reply to comment #31) > > One quick theory is that rxvt is crashing. It feels familiar, but I can't dig > > anything like that up at the moment. > > Can we rum cmd instead of rxvt? Is there any requirements for rxvt? We could, but I'm hard pressed to make a change like that without evidence that is helps, or at least doesn't bust anything. If you have time, could you run a slave in staging which launches with cmd.exe for awhile?
Comment 34•14 years ago
|
||
I found some registry differences in the OPSI section between ix01, which is working fine at this point, so I tried copying those settings to another ix machine, but that didn't fix the problem there. Then I tried a bunch of other things: * Reinstalling OPSI from the server -- still had startup troubles * Uninstalling OPSI entirely -- fixed it * Installing OPSI from the server again -- still had startup troubles * Manually installing the OPSI client with files from a VM that has never failed -- still had startup troubles It's pretty clear that something OPSI is doing is getting in the way, but I'm not sure what.
Assignee | ||
Comment 35•14 years ago
|
||
(In reply to comment #33) > We could, but I'm hard pressed to make a change like that without evidence that > is helps, or at least doesn't bust anything. If you have time, could you run a > slave in staging which launches with cmd.exe for awhile? win32-slave03 has been running buildbot using cmd.exe ~13 hours and attached to sm02. So far so good. A bunch of different builds passed, from nightlies and unittests to release builds and repacks. Let's see what's happening next some days. The only unusual thing is cmd.exe's caption changes: http://img217.imageshack.us/img217/4288/screenshotsy.png
Comment 36•14 years ago
|
||
Another straw to grasp at: We modify the registry at every boot with a few OPSI packages. Somehow this could be screwing with the boot, maybe? I've disabled these start up jobs, which aren't technically required, anyways.
Comment 37•14 years ago
|
||
This patch uses 'set -x', signal trapping, and redirection of all output to a file to hopefully glean some information about how or why the shell script is dying. I wanted to use: exec > >(tee -a $LOG) 2>&1 to log to the console, too, but apparently that particular construction is not implemented in MSYS' bash. Probably doesn't matter, since the shell window closes after sh dies.
Attachment #439607 -
Flags: review?(catlee)
Assignee | ||
Comment 38•14 years ago
|
||
How about to try using the following approach: === Snippet === cd %USERPROFILE% :start "%MOZILLABUILD%\msys\bin\bash" --login -c /d/mozilla-build/buildbot-start.sh "%MOZILLABUILD%\msys\bin\sleep" 30 goto start === Snippet === In this case we have some kind of self-recovery. win32-slave03 has been running this script more than 24 hours without any visible regression so far. We can try this script on one of the problematic boxes and increase the sleep period so that nagios can detect the failure. Another approach is convert the bat file to exe (no FOSS converter found, freeware only) and run it as a service. Windows® can monitor and restart its services in case of failure.
Comment 39•14 years ago
|
||
(In reply to comment #38) > How about to try using the following approach: > > === Snippet === > cd %USERPROFILE% > > :start > "%MOZILLABUILD%\msys\bin\bash" --login -c /d/mozilla-build/buildbot-start.sh > "%MOZILLABUILD%\msys\bin\sleep" 30 > > goto start > === Snippet === > > In this case we have some kind of self-recovery. > > win32-slave03 has been running this script more than 24 hours without any > visible regression so far. We can try this script on one of the problematic > boxes and increase the sleep period so that nagios can detect the failure. This is absolutely worth a try. Can you post a patch for this? > Another approach is convert the bat file to exe (no FOSS converter found, > freeware only) and run it as a service. Windows® can monitor and restart its > services in case of failure. The service approach doesn't work because Buildbot doesn't know how to launch processes on the Desktop. We'd need to do Buildbot hacking first.
Comment 40•14 years ago
|
||
Comment on attachment 437617 [details] [diff] [review] buildbot startup, rev2 This patch ended up busting some try slaves that rebooted (for some reason). They were using /e/builds/sendchange-slave. I switched them to use /e/builds/slave rather than add on to this patch. There might be other try slaves that need this change, too.
Assignee | ||
Comment 41•14 years ago
|
||
(In reply to comment #39) > This is absolutely worth a try. Can you post a patch for this? Use endless loop without launching rxvt minimized.
Attachment #439815 -
Flags: review?(bhearsum)
Updated•14 years ago
|
Attachment #439607 -
Flags: review?(catlee) → review+
Assignee | ||
Comment 42•14 years ago
|
||
I've added create-shortcut.vbs which creates a shortcut in Starup group which runs minimized. Tested on win32-slave03.b.m.o. buildbot-startup.ins removes old menu entry as well.
Attachment #439815 -
Attachment is obsolete: true
Attachment #439961 -
Flags: feedback?(bhearsum)
Attachment #439815 -
Flags: review?(bhearsum)
Assignee | ||
Comment 43•14 years ago
|
||
Attachment #439961 -
Attachment is obsolete: true
Attachment #439961 -
Flags: feedback?(bhearsum)
Assignee | ||
Comment 44•14 years ago
|
||
Attachment #440215 -
Flags: review?(bhearsum)
Assignee | ||
Updated•14 years ago
|
Attachment #440214 -
Flags: review?(bhearsum)
Updated•14 years ago
|
Attachment #440215 -
Flags: review?(bhearsum) → review+
Comment 45•14 years ago
|
||
Comment on attachment 440214 [details] [diff] [review] start-buildbot.bat: use endless loop, run without rxvt Assuming the build currently running on win32-slave03 finishes successfully, let's land these later today.
Attachment #440214 -
Flags: review?(bhearsum) → review+
Comment 46•14 years ago
|
||
Comment on attachment 439607 [details] [diff] [review] better logging, signal catching from buildbot sh script changeset: 45:5f1bc7cfec25
Attachment #439607 -
Flags: checked-in+
Comment 47•14 years ago
|
||
Comment on attachment 440214 [details] [diff] [review] start-buildbot.bat: use endless loop, run without rxvt changeset: 46:a0ded5a249ce
Attachment #440214 -
Flags: checked-in+
Assignee | ||
Updated•14 years ago
|
Attachment #440215 -
Flags: checked-in+
Assignee | ||
Comment 48•14 years ago
|
||
Comment on attachment 440215 [details] [diff] [review] Run start-buildbot.bat minimized Checking in buildbot-startup/buildbot.bat; /mofo/opsi-binaries/buildbot-startup/buildbot.bat,v <-- buildbot.bat new revision: 1.3; previous revision: 1.2 done
Comment 49•14 years ago
|
||
Rail set the buildbot-startup package to deploy on all of the build machines again.
Assignee | ||
Comment 50•14 years ago
|
||
Seems like start-buildbot.sh hasn't deployed: Error: copy of P:\install\buildbot-startup\start-buildbot.sh to d:\mozilla-build\start-buildbot.sh not possible. File Err. No. 32 (The process cannot access the file because it is being used by another process) Errorcode 32 ("The process cannot access the file because it is being used by another process") Very strange to see a user startup process running *before* OPSI client.
Assignee | ||
Comment 51•14 years ago
|
||
Just compare the time when opsi preloadingloader and buildbot start. This shouldn't happen imho. c:\tmp\buildbot-startup.log snippet ------------------------------------- "Tue 04/20/2010 14:39:56.56 - Sleeping at the end of buildbot.bat" "Tue 04/20/2010 14:40:27.73 - About to run start-buildbot.bat" "Tue 04/20/2010 14:40:27.84 - About to call guess-msvc.bat" "Tue 04/20/2010 14:40:27.84 - Start of guess-msvc.bat" "Tue 04/20/2010 14:40:27.85 - About to query MSVC8KEY" "Tue 04/20/2010 14:40:27.87 - Queried, VC8DIR is " "Tue 04/20/2010 14:40:27.96 - End of guess-msvc.bat" "Tue 04/20/2010 14:40:27.96 - Calling vcvars32.bat in VC8DIR" "Tue 04/20/2010 14:40:28.03 - About to run start-buildbot.sh" "Tue 04/20/2010 14:40:28.06 - start-buildbot.sh finished" ------------------------------------- c:\tmp\instlog.txt snippet: ------------------------------------- ============ Version 4.8.8.1 WIN32 script "P:\install\buildbot-startup\buildbot-startup.ins" start: 2010-04-20 14:40:28 (on client named as : "mw32-ix-slave23.uib.local") [executing: "C:\Program Files\opsi.org\preloginloader\opsi-winst\winst32.exe"] system infos: mw32-ix-slave23.build.mozilla.org -------------------------------------
Assignee | ||
Comment 52•14 years ago
|
||
Probably we should install loginblocker to these machines. I can see some differences between win32-slave03 and ix-slave23: HKEY_LOCAL_MACHINE\SOFTWARE\Miscrosoft\Windows NT\CurrentVersion\WinLogon\GinaDLL which is set on win32-slave03 and doesn't exist at all on ix machine. Seems like we have the same opsi version but not the same login behavior. Randomly checked slaves: Problematic ones: mw32-ix-slave01: no GinaDLL registry entry mw32-ix-slave05: no GinaDLL registry entry mw32-ix-slave23: no GinaDLL registry entry moz2-win32-slave50: no GinaDLL registry entry moz2-win32-slave54: no GinaDLL registry entry Stable ones: moz2-win32-slave05: has a GinaDLL registry entry win32-slave03: has a GinaDLL registry entry moz2-win32-slave40: has a GinaDLL registry entry moz2-win32-slave49: has a GinaDLL registry entry If loginblocker fixes this issue I want my beer! :)
Assignee | ||
Comment 53•14 years ago
|
||
Snippet from preloginloader.ins: ---------------------------------------- comment "copying loginblocker" if ($INST_MinorOS$ = "Windows Vista") if ($INST_system_bit$ = "64") Files_copy_vista_loginblocker_64 DosInAnIcon_vista64_loginblocker ExecWith_vista64_loginblocker "%systemroot%\cmd64.exe" /c else Files_copy_vista_loginblocker_32 Files_del_cmd64 endif endif if (($INST_MinorOS$ = "WinXP") or ($INST_MinorOS$ = "Win2k")) if ($INST_system_bit$ = "64") Files_copy_xp_loginblocker_64 else Files_copy_xp_loginblocker_32 Files_del_cmd64 endif endif ---------------------------------------- No win2k3 check (all checks fail, you can look to c:\tmp\instlog.txt), so no pgina.dll is going to be installed.
Comment 54•14 years ago
|
||
Rail has some fresh ideas here, passing this bug off to him :)
Assignee: bhearsum → rail
Assignee | ||
Comment 55•14 years ago
|
||
Seems like the main problem is preloginloader opsi package, which doesn't install its library and registry entries on Windows 2003. The package itself is complicated a bit, so extracting the loginblocker related pieces and creating a new package will be risky a bit. I'd prefer to patch the preloginloader package and reinstall it, at least as a short term fix.
Attachment #440470 -
Flags: review?(bhearsum)
Assignee | ||
Comment 56•14 years ago
|
||
Comment 57•14 years ago
|
||
Comment on attachment 440470 [details] [diff] [review] preloginloader opsi package patch Awesome work here, Rail.
Attachment #440470 -
Flags: review?(bhearsum) → review+
Comment 58•14 years ago
|
||
Comment on attachment 440470 [details] [diff] [review] preloginloader opsi package patch Rail landed this shortly ago. We also landed a follow-up patch to disable the forced reboots after the installation finishes, since jobs could be running while this happens: RCS file: /mofo/opsi-binaries/preloginloader/CLIENT_DATA/files/opsi/preloginloader.ins,v retrieving revision 1.2 retrieving revision 1.3 diff -u -r1.2 -r1.3 --- files/opsi/preloginloader.ins 23 Apr 2010 19:39:23 -0000 1.2 +++ files/opsi/preloginloader.ins 23 Apr 2010 19:48:35 -0000 1.3 @@ -402,9 +402,11 @@ sub_clean_up ; all is done but make a reboot after terminating with the script - if ($INST_AllowReboot$ = "true") - ExitWindows /Reboot - endif + ; Commented out so we can roll out to machines which aren't + ; properly blocking the login. + ;if ($INST_AllowReboot$ = "true") + ; ExitWindows /Reboot + ;endif endif ; diskspace endif ; correct OS Version
Attachment #440470 -
Flags: checked-in+
Comment 59•14 years ago
|
||
We rolled out the updated preloginloader to mw32-ix-slave02, 03, and win32-slave50 and 51. If these slaves stay up over the weekend, we'll roll out to the rest of them. Go Rail!
Comment 60•14 years ago
|
||
We marked the new preloginloader for install on the rest of the ix slaves and VMs (ix 04-25, VMs 01-49; 52-59, and the try VMs).
Comment 61•14 years ago
|
||
Most of the machines have picked up the new preloader. I also rebooted both ref images, and they now have it.
Comment 62•14 years ago
|
||
Occasionally, we're seeing the following in an OPSI dialog, which ends up blocking the login until it is clicked through: Zeitüberschreitung bei verbindung -- which translates to "Connection timeout". We lowered the connection timeout for OPSI down to 20 seconds in bug 522078 -- we could try bumping it back up to a minute or so.
Reporter | ||
Comment 63•14 years ago
|
||
nagios complained about mw32-ix-slave10 today, on inspection the screen saver was running but it logged on soon after I connected.
Comment 64•14 years ago
|
||
(In reply to comment #63) > nagios complained about mw32-ix-slave10 today, on inspection the screen saver > was running but it logged on soon after I connected. Hmmm, Rail and I logged on before you and clicked through the OPSI dialog described in comment #62. We should bump the timeout for OPSI to avoid that dialog overall, but I don't know why or how they're getting stuck on the screensaver, I thought that was fixed :-(.
Reporter | ||
Comment 65•14 years ago
|
||
(This is probably the same but more data for the mill) * mw32-ix-slave02 - nagios alerts going for 19 hours over weekend * screensaver showing on first connection with VNC * two dialogs showing after screensaver gone * topmost is 'Eventlog Service' complaining: The 'opsi Log' is full. If this is the first time you have seen this message, take the follwing steps: 1. Click Start, click Run, type "eventvmr", and then click OK 2. Click opsi, click the Action menu, click Clear All Events, and then click No. If this dialog reappears, contact your helpdesk or system administrator * lower dialog is the wInst-Message dialog of comment #62
Assignee | ||
Comment 66•14 years ago
|
||
IIRC, the first dialog (opsi Log is full) is not blocking, but in any case we should clean up old logs somehow.
Comment 67•14 years ago
|
||
Yeah, the 'log is full dialog' disappears on its own after the login finishes.
Assignee | ||
Updated•14 years ago
|
Whiteboard: [buildslaves][opsi]
Assignee | ||
Updated•14 years ago
|
Status: NEW → RESOLVED
Closed: 14 years ago
Resolution: --- → FIXED
Updated•11 years ago
|
Product: mozilla.org → Release Engineering
You need to log in
before you can comment on or make changes to this bug.
Description
•