Closed
Bug 853112
Opened 13 years ago
Closed 13 years ago
A job on Windows XP slaves can take many hours because a SIGKILL did not kill the process
Categories
(Infrastructure & Operations Graveyard :: CIDuty, task)
Tracking
(Not tracked)
RESOLVED
DUPLICATE
of bug 854553
People
(Reporter: armenzg, Unassigned)
Details
For instance this:
http://buildbot-master46.build.scl1.mozilla.com:8201/builders/Rev3%20WINNT%205.1%20mozilla-inbound%20debug%20test%20crashtest/builds/35
The SIGKILL was sent successfully. I was hoping that deploying _dumbwin32proc.py would help with situations like this.
http://mxr.mozilla.org/build/source/buildbot/slave/buildslave/runprocess.py#723
I don't see anything on the Windows event viewer.
Any ideas on what I could look into?
If I kill the browser running the reftests will tell me that an unresponsive Firefox was kill by me and if I want to report it.
Doing these two manual steps did not cause for the job to progress.
Hitting "stop build" on buildbot made us go forward. [2]
##################
On another slave with the same symptom I hit "stop build" since the beginning:
http://buildbot-master48.build.scl1.mozilla.com:8201/builders/Rev3%20WINNT%205.1%20try%20debug%20test%20crashtest/builds/17
This approach does nothing either. [3]
Killing the browser manually did not make us recover either.
This time I actually had to run "shutdown -f -r -t 0".
Any suggestions?
I assume briar patch would eventually reboot the machines but it would be interesting to figure out what is going on.
[1]
2013-03-19 19:12:29-0700 [Broker,client] in dir c:\talos-slave\test\. (timeout 1200 secs) (maxTime 7200 secs)
2013-03-19 19:12:29-0700 [Broker,client] watching logfiles {}
2013-03-19 19:12:29-0700 [Broker,client] argv: ['c:/mozilla-build/python27/python', '-u', 'scripts/scripts/desktop_unittest.py', '--cfg', 'unitte
sts/win_unittest.py', '--reftest-suite', 'crashtest', '--download-symbols', 'true']
2013-03-19 19:12:29-0700 [Broker,client] environment: {'TMP': 'C:\\DOCUME~1\\cltbld\\LOCALS~1\\Temp', 'MOZILLABUILD': 'D:\\mozilla-build', 'COMPUT
ERNAME': 'TALOS-R3-XP-035', 'MOZ_NO_REMOTE': '1', 'USERDOMAIN': 'TALOS-R3-XP-035', 'LIBPATH': 'C:\\WINDOWS\\Microsoft.NET\\Framework\\v2.0.50727;D:
\\msvs8\\VC\\ATLMFC\\LIB', 'COMMONPROGRAMFILES': 'C:\\Program Files\\Common Files', 'MOZILLABUILDPATH': '\\mozilla-build\\', 'PROCESSOR_IDENTIFIER'
: 'x86 Family 6 Model 23 Stepping 10, GenuineIntel', 'PROGRAMFILES': 'C:\\Program Files', 'PROCESSOR_REVISION': '170a', 'SYSTEMROOT': 'C:\\WINDOWS'
, 'PATH': 'C:\\WINDOWS\\system32;C:\\WINDOWS;C:\\WINDOWS\\System32\\Wbem;c:\\Program Files\\gnuwin32\\bin;c:\\Python24\\;c:\\Python24\\scripts;c:\\
Program Files\\Vim\\vim72', 'NO_EM_RESTART': '1', 'MSVCDir': 'D:\\msvs8\\VC', 'XPCOM_DEBUG_BREAK': 'warn', 'TEMP': 'C:\\DOCUME~1\\cltbld\\LOCALS~1\
\Temp', 'PROCESSOR_ARCHITECTURE': 'x86', 'VCVARS': 'D:\\msvs8\\VC\\bin\\vcvars32.bat', 'VSINSTALLDIR': 'D:\\msvs8', 'ALLUSERSPROFILE': 'C:\\Documen
ts and Settings\\All Users', 'DevEnvDir': 'D:\\msvs8\\Common7\\IDE', 'MOZILLABUILDDRIVE': 'C:', 'SESSIONNAME': 'Console', 'HOMEPATH': '\\Documents
and Settings\\cltbld', 'FrameworkDir': 'C:\\WINDOWS\\Microsoft.NET\\Framework', 'MOZ_HIDE_RESULTS_TABLE': '1', 'FrameworkVersion': 'v2.0.50727', 'U
SERNAME': 'cltbld', 'LOGONSERVER': '\\\\TALOS-R3-XP-035', 'PROMPT': '$P$G', 'COMSPEC': 'C:\\WINDOWS\\system32\\cmd.exe', 'MOZ_TOOLS': 'D:\\mozilla-
build\\moztools', 'BOOTMODE': 'BKSTD', 'NO_FAIL_ON_TEST_ERRORS': '1', 'PATHEXT': '.COM;.EXE;.BAT;.CMD;.VBS;.VBE;.JS;.JSE;.WSF;.WSH', 'CLIENTNAME':
'Console', 'FP_NO_HOST_CHECK': 'NO', 'WINDIR': 'C:\\WINDOWS', 'HOMEDRIVE': 'C:', 'APPDATA': 'C:\\Documents and Settings\\cltbld\\Application Data',
'MOZ_AIRBAG': '1', 'SYSTEMDRIVE': 'C:', 'MOZ_MSVCVERSION': '8', 'NUMBER_OF_PROCESSORS': '2', 'PWD': 'c:\\talos-slave\\test', 'PROCESSOR_LEVEL': '6
', 'PROPERTIES_FILE': 'c:\\talos-slave\\test/buildprops.json', 'MOZ_CRASHREPORTER_NO_REPORT': '1', 'VCINSTALLDIR': 'D:\\msvs8\\VC', 'OS': 'Windows_
NT', 'FrameworkSDKDir': 'D:\\msvs8\\SDK\\v2.0', 'USERPROFILE': 'C:\\Documents and Settings\\cltbld'}
2013-03-19 19:12:29-0700 [Broker,client] using PTY: False
2013-03-19 21:12:29-0700 [-] command timed out: 7200 seconds elapsed, attempting to kill
2013-03-19 21:12:29-0700 [-] trying process.signalProcess('KILL')
2013-03-19 21:12:29-0700 [-] signal KILL sent successfully
[2] after hitting "stop build"
2013-03-19 21:12:29-0700 [-] signal KILL sent successfully
2013-03-20 11:43:06-0700 [Broker,client] asked to interrupt current command: The web-page 'stop build' button was pressed by '<unknown>':
2013-03-20 11:43:08-0700 [Broker,client] command interrupted, attempting to kill
2013-03-20 11:43:08-0700 [Broker,client] trying process.signalProcess('KILL')
2013-03-20 11:43:09-0700 [Broker,client] Process exited already - can't kill
2013-03-20 11:43:09-0700 [Broker,client] signalProcess/os.kill failed both times
2013-03-20 11:43:14-0700 [-] we tried to kill the process, and it wouldn't die.. finish anyway
2013-03-20 11:43:14-0700 [-] RunProcess.failed: command failed: SIGKILL failed to kill process
2013-03-20 11:43:15-0700 [-] SlaveBuilder.commandFailed <buildslave.commands.shell.SlaveShellCommand instance at 0x0186BC88>
2013-03-20 11:43:15-0700 [-] Unhandled Error
Traceback (most recent call last):
Failure: exceptions.RuntimeError: SIGKILL failed to kill process
2013-03-20 11:43:18-0700 [Broker,client] startCommand:shell [id 161169]
[3]
2013-03-19 22:59:16-0700 [-] command timed out: 7200 seconds elapsed, attempting to kill
2013-03-19 22:59:16-0700 [-] trying process.signalProcess('KILL')
2013-03-19 22:59:16-0700 [-] signal KILL sent successfully
2013-03-20 11:46:05-0700 [Broker,client] asked to interrupt current command: The web-page 'stop build' button was pressed by '<unknown>':
2013-03-20 11:46:05-0700 [Broker,client] command interrupted, attempting to kill
2013-03-20 11:46:05-0700 [Broker,client] trying process.signalProcess('KILL')
2013-03-20 11:46:05-0700 [Broker,client] Process exited already - can't kill
2013-03-20 11:46:05-0700 [Broker,client] signalProcess/os.kill failed both times
| Reporter | ||
Comment 1•13 years ago
|
||
talos-r3-xp-048 hit this (bug 789662).
| Reporter | ||
Updated•13 years ago
|
Status: NEW → RESOLVED
Closed: 13 years ago
Resolution: --- → DUPLICATE
| Assignee | ||
Updated•12 years ago
|
Product: mozilla.org → Release Engineering
| Assignee | ||
Updated•8 years ago
|
Component: Platform Support → Buildduty
Product: Release Engineering → Infrastructure & Operations
Updated•6 years ago
|
Product: Infrastructure & Operations → Infrastructure & Operations Graveyard
You need to log in
before you can comment on or make changes to this bug.
Description
•