Closed Bug 533185 Opened 10 years ago Closed 10 years ago

Disable JIT debugger on windows slaves


(Release Engineering :: General, defect, P2, major)

Windows Server 2003


(Not tracked)



(Reporter: cjones, Assigned: nthomas)




(2 files, 1 obsolete file)

Since landed the
"WINNT 5.2 electrolysis unit test" builder has been orange. It's timing out in
make check:
make[4]: Entering directory
Ran 71 tests in 22.047s

make[4]: Leaving directory
make[4]: Entering directory
make[5]: Entering directory
make[5]: Nothing to be done for `check'.
make[5]: Leaving directory
WARNING: RegisterWaitForSingleObject failed: 31: file
line 62
WARNING: Tried to RegisterCallback without an AtExitManager: file
line 40
command timed out: 300 seconds without output

On the slave there is a dialog box up:
 nsAppShell:EventWIndow: mozilla-runtime.exe - Application Error
 The instruction at "0x7c81a379"" referenced memory at "0x0000009c" The memory
could not be "written".

Unusually, this also leads to buildbot dying and we lose a slave for the
electrolysis, tracemonkey, and mozila-1.9.1 pool.

Hrm, I thought the buildbot slaves had this dialog disabled at the Windows
level. This is done with System Properties/Advanced/Error Reporting: "Disable
Error Reporting" and uncheck "But notify me..."
I'm trying this suggestion on moz2-win32-slave53. The dialog box goes on to say "Click OK to terminate, Cancel to debug" so it may be a different setting to change.
With error reporting disabled, we got this
* mozilla-runtime.exe crashes with log like comment #0
* Windows displays an "application has been terminated" dialog, only option is OK button
* VNC server dead so connect to the console with VI
* shows same dialog as described in comment #0 and #1 (OK to terminate, Cancel to Debug)
* buildbot log has "Received SIGBREAK, shutting down" followed by it killing the call to "make check", some "lost remote" messages, and shutting down buildbot. This is the normal output from rebooting after each build.

So it looks like the debug prompt blocks the reboot. This is probably the first time a make check binary has started crashing since we enabled reboots after every build, so we may not have noticed this before.

moz2-win32-slave53 has error reporting re-enabled and returned to the pool.
(In reply to comment #2)
> So it looks like the debug prompt blocks the reboot. 

Indeed it does. Deleting
 HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\AeDebug\Debugger
and changing 
 HKEY_LOCAL_MACHINE\SOFTWARE\Microsoft\Windows NT\CurrentVersion\AeDebug\Auto
from 0 to 1 results in a clean reboot.

Assignee: nobody → nrthomas
Summary: Invalid memory reads in tests can lead to disconnected build slaves → Disable JIT debugger on windows slaves
For posterity.
Attachment #416368 - Attachment mime type: application/octet-stream → text/plain
Attached patch WIP (obsolete) — Splinter Review
Installs OK, just having some issues with setting Debugger again on uninstall. The error message is:
 Sektion Registry_enable_jit (Kommando in Ziele 11):
 set "Debugger"="\"C:\\WINDOWS\\system32\\vsjitdebugger.exe\" -p %ld -e %ld"
 Char(s) at end of line not interpreted
Could also delete VDebugger from when tried to do this originally
but it's not obvious what winSt does if you try to delete a key that doesn't exist.
Attached patch Working versionSplinter Review
Compared to the WIP this also removes the VDebugger key, which was a failed attempt to do this previously, and has a mostly-correct uninstall. The Debugger key gets created with a REG_EXPAND_SZ type instead of REG_SZ. Since I tried to specifying REG_SZ in enable-jit.ins and still got REG_EXPAND_SZ I'm going to assume that windows insisted on that type and it's going to work like that.

I'm going to install this on the three staging slaves and check for really obvious bustage.
Attachment #416380 - Attachment is obsolete: true
Attachment #421582 - Flags: review?(bhearsum)
Priority: -- → P2
Attachment #421582 - Flags: review?(bhearsum) → review+
Set to install on the production win32-slave's - it'll get picked up as they complete a job and reboot. I'll check tomorrow for any that haven't got it yet and give them a kick.

All try-win32-slave* set to update. Still to do: 3 10 11 16 19-21 23-27 29

Win32 Ref platform VM updated (version 24) and docs updated. ix-ref not set to update; Ben, do we have a good place to track updating the new boxes ?
All current slaves updated. Leaving open for answer to question in comment #8.
I just filed bug 541953 to track whatever delta there will be when these slaves arrive.
Thanks Ben.
Closed: 10 years ago
Resolution: --- → FIXED
Blocks: 652391
Product: → Release Engineering
You need to log in before you can comment on or make changes to this bug.