Closed Bug 853112 Opened 13 years ago Closed 13 years ago

A job on Windows XP slaves can take many hours because a SIGKILL did not kill the process

Categories

(Infrastructure & Operations Graveyard :: CIDuty, task)

x86
macOS
task
Not set
normal

Tracking

(Not tracked)

RESOLVED DUPLICATE of bug 854553

People

(Reporter: armenzg, Unassigned)

Details

For instance this: http://buildbot-master46.build.scl1.mozilla.com:8201/builders/Rev3%20WINNT%205.1%20mozilla-inbound%20debug%20test%20crashtest/builds/35 The SIGKILL was sent successfully. I was hoping that deploying _dumbwin32proc.py would help with situations like this. http://mxr.mozilla.org/build/source/buildbot/slave/buildslave/runprocess.py#723 I don't see anything on the Windows event viewer. Any ideas on what I could look into? If I kill the browser running the reftests will tell me that an unresponsive Firefox was kill by me and if I want to report it. Doing these two manual steps did not cause for the job to progress. Hitting "stop build" on buildbot made us go forward. [2] ################## On another slave with the same symptom I hit "stop build" since the beginning: http://buildbot-master48.build.scl1.mozilla.com:8201/builders/Rev3%20WINNT%205.1%20try%20debug%20test%20crashtest/builds/17 This approach does nothing either. [3] Killing the browser manually did not make us recover either. This time I actually had to run "shutdown -f -r -t 0". Any suggestions? I assume briar patch would eventually reboot the machines but it would be interesting to figure out what is going on. [1] 2013-03-19 19:12:29-0700 [Broker,client] in dir c:\talos-slave\test\. (timeout 1200 secs) (maxTime 7200 secs) 2013-03-19 19:12:29-0700 [Broker,client] watching logfiles {} 2013-03-19 19:12:29-0700 [Broker,client] argv: ['c:/mozilla-build/python27/python', '-u', 'scripts/scripts/desktop_unittest.py', '--cfg', 'unitte sts/win_unittest.py', '--reftest-suite', 'crashtest', '--download-symbols', 'true'] 2013-03-19 19:12:29-0700 [Broker,client] environment: {'TMP': 'C:\\DOCUME~1\\cltbld\\LOCALS~1\\Temp', 'MOZILLABUILD': 'D:\\mozilla-build', 'COMPUT ERNAME': 'TALOS-R3-XP-035', 'MOZ_NO_REMOTE': '1', 'USERDOMAIN': 'TALOS-R3-XP-035', 'LIBPATH': 'C:\\WINDOWS\\Microsoft.NET\\Framework\\v2.0.50727;D: \\msvs8\\VC\\ATLMFC\\LIB', 'COMMONPROGRAMFILES': 'C:\\Program Files\\Common Files', 'MOZILLABUILDPATH': '\\mozilla-build\\', 'PROCESSOR_IDENTIFIER' : 'x86 Family 6 Model 23 Stepping 10, GenuineIntel', 'PROGRAMFILES': 'C:\\Program Files', 'PROCESSOR_REVISION': '170a', 'SYSTEMROOT': 'C:\\WINDOWS' , 'PATH': 'C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;c:\\Program Files\\gnuwin32\\bin;c:\\Python24\\;c:\\Python24\\scripts;c:\\ Program Files\\Vim\\vim72', 'NO_EM_RESTART': '1', 'MSVCDir': 'D:\\msvs8\\VC', 'XPCOM_DEBUG_BREAK': 'warn', 'TEMP': 'C:\\DOCUME~1\\cltbld\\LOCALS~1\ \Temp', 'PROCESSOR_ARCHITECTURE': 'x86', 'VCVARS': 'D:\\msvs8\\VC\\bin\\vcvars32.bat', 'VSINSTALLDIR': 'D:\\msvs8', 'ALLUSERSPROFILE': 'C:\\Documen ts and Settings\\All Users', 'DevEnvDir': 'D:\\msvs8\\Common7\\IDE', 'MOZILLABUILDDRIVE': 'C:', 'SESSIONNAME': 'Console', 'HOMEPATH': '\\Documents and Settings\\cltbld', 'FrameworkDir': 'C:\\WINDOWS\\Microsoft.NET\\Framework', 'MOZ_HIDE_RESULTS_TABLE': '1', 'FrameworkVersion': 'v2.0.50727', 'U SERNAME': 'cltbld', 'LOGONSERVER': '\\\\TALOS-R3-XP-035', 'PROMPT': '$P$G', 'COMSPEC': 'C:\\WINDOWS\\system32\\cmd.exe', 'MOZ_TOOLS': 'D:\\mozilla- build\\moztools', 'BOOTMODE': 'BKSTD', 'NO_FAIL_ON_TEST_ERRORS': '1', 'PATHEXT': '.COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH', 'CLIENTNAME': 'Console', 'FP_NO_HOST_CHECK': 'NO', 'WINDIR': 'C:\\WINDOWS', 'HOMEDRIVE': 'C:', 'APPDATA': 'C:\\Documents and Settings\\cltbld\\Application Data', 'MOZ_AIRBAG': '1', 'SYSTEMDRIVE': 'C:', 'MOZ_MSVCVERSION': '8', 'NUMBER_OF_PROCESSORS': '2', 'PWD': 'c:\\talos-slave\\test', 'PROCESSOR_LEVEL': '6 ', 'PROPERTIES_FILE': 'c:\\talos-slave\\test/buildprops.json', 'MOZ_CRASHREPORTER_NO_REPORT': '1', 'VCINSTALLDIR': 'D:\\msvs8\\VC', 'OS': 'Windows_ NT', 'FrameworkSDKDir': 'D:\\msvs8\\SDK\\v2.0', 'USERPROFILE': 'C:\\Documents and Settings\\cltbld'} 2013-03-19 19:12:29-0700 [Broker,client] using PTY: False 2013-03-19 21:12:29-0700 [-] command timed out: 7200 seconds elapsed, attempting to kill 2013-03-19 21:12:29-0700 [-] trying process.signalProcess('KILL') 2013-03-19 21:12:29-0700 [-] signal KILL sent successfully [2] after hitting "stop build" 2013-03-19 21:12:29-0700 [-] signal KILL sent successfully 2013-03-20 11:43:06-0700 [Broker,client] asked to interrupt current command: The web-page 'stop build' button was pressed by '&lt;unknown&gt;': 2013-03-20 11:43:08-0700 [Broker,client] command interrupted, attempting to kill 2013-03-20 11:43:08-0700 [Broker,client] trying process.signalProcess('KILL') 2013-03-20 11:43:09-0700 [Broker,client] Process exited already - can't kill 2013-03-20 11:43:09-0700 [Broker,client] signalProcess/os.kill failed both times 2013-03-20 11:43:14-0700 [-] we tried to kill the process, and it wouldn't die.. finish anyway 2013-03-20 11:43:14-0700 [-] RunProcess.failed: command failed: SIGKILL failed to kill process 2013-03-20 11:43:15-0700 [-] SlaveBuilder.commandFailed <buildslave.commands.shell.SlaveShellCommand instance at 0x0186BC88> 2013-03-20 11:43:15-0700 [-] Unhandled Error Traceback (most recent call last): Failure: exceptions.RuntimeError: SIGKILL failed to kill process 2013-03-20 11:43:18-0700 [Broker,client] startCommand:shell [id 161169] [3] 2013-03-19 22:59:16-0700 [-] command timed out: 7200 seconds elapsed, attempting to kill 2013-03-19 22:59:16-0700 [-] trying process.signalProcess('KILL') 2013-03-19 22:59:16-0700 [-] signal KILL sent successfully 2013-03-20 11:46:05-0700 [Broker,client] asked to interrupt current command: The web-page 'stop build' button was pressed by '&lt;unknown&gt;': 2013-03-20 11:46:05-0700 [Broker,client] command interrupted, attempting to kill 2013-03-20 11:46:05-0700 [Broker,client] trying process.signalProcess('KILL') 2013-03-20 11:46:05-0700 [Broker,client] Process exited already - can't kill 2013-03-20 11:46:05-0700 [Broker,client] signalProcess/os.kill failed both times
talos-r3-xp-048 hit this (bug 789662).
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → DUPLICATE
Product: mozilla.org → Release Engineering
Component: Platform Support → Buildduty
Product: Release Engineering → Infrastructure & Operations
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in before you can comment on or make changes to this bug.